{"id":127,"date":"2023-06-21T16:01:08","date_gmt":"2023-06-21T14:01:08","guid":{"rendered":"https:\/\/blog.openshift.one\/?p=127"},"modified":"2023-06-29T10:44:41","modified_gmt":"2023-06-29T08:44:41","slug":"horizontal-pod-autoscaler","status":"publish","type":"post","link":"https:\/\/blog.openshift.one\/index.php\/2023\/06\/21\/horizontal-pod-autoscaler\/","title":{"rendered":"Horizontal Pod AutoScaler"},"content":{"rendered":"\n<div style=\"height:25px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<p class=\"wp-block-paragraph\">OpenShift allows you to run &#8211; among the Virtual Machines &#8211; containerised workloads. One of the biggest benefit of containers is the ability to turn large projects into small containers (microservices), what enables you to develop and manage them independently from each other. One of the most important aspects of service management is to scale it accordingly to the load to make its users happy. You can try to predict the load and provide enough resources to handle it or you can rely on automated scaling. Manual scaling can be challenging and often can cause under or over capacity &#8211; situation where you provide too much or not enough resources to handle the load. Both situations can cause lost in reputation, budget or both together. This is where automatic scaling comes with help! \ud83d\ude42<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">OpenShift provides the following built-in autoscaling solutions:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>HorizontalPodAutoScaler (HPA) &#8211; adds or removes pods based on simple CPU\/Memory usage metrics<\/li>\n\n\n\n<li>VerticalPodAutoScaler (VPA) &#8211; updates the resource limits and requests accordingly to historic and current CPU and memory usage<\/li>\n\n\n\n<li>Custom Metrics Autoscaler Operator &#8211; increase or decrease the number of pods based on custom metrics (other than only Memory or CPU)<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">In this post I will focus on the first one &#8211; HorizontalPodAutoScaler (HPA).<\/p>\n\n\n\n<div style=\"height:25px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<h3 class=\"wp-block-heading\">Metrics in OpenShift<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">OpenShift out of the box collects CPU and memory usage metrics from the running workloads. You can easily view them running <code>oc adm top pods<\/code> or <code>oc describe PodMetrics<\/code> commands, for an instance:<\/p>\n\n\n\n<pre class=\"wp-block-code has-background-color has-foreground-background-color has-text-color has-background\"><code>$ oc adm top pods\nNAME                    CPU(cores)   MEMORY(bytes)\nmyapp-95bb75667-hq7fk   230m         15Mi\n\n$ oc describe PodMetrics myapp-95bb75667-hq7fk\n(...)\nContainers:\n  Name:  myapp\n  Usage:\n    Cpu:     230m\n    Memory:  15768Ki\n(...)<\/code><\/pre>\n\n\n\n<div style=\"height:25px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<p class=\"wp-block-paragraph\">Additionally you can see simple graphs in OpenShift WebUI:<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1626\" height=\"1302\" src=\"https:\/\/blog.openshift.one\/wp-content\/uploads\/2023\/06\/Screenshot-2023-06-14-at-13.54.07.png\" alt=\"\" class=\"wp-image-128\" srcset=\"https:\/\/blog.openshift.one\/wp-content\/uploads\/2023\/06\/Screenshot-2023-06-14-at-13.54.07.png 1626w, https:\/\/blog.openshift.one\/wp-content\/uploads\/2023\/06\/Screenshot-2023-06-14-at-13.54.07-300x240.png 300w, https:\/\/blog.openshift.one\/wp-content\/uploads\/2023\/06\/Screenshot-2023-06-14-at-13.54.07-1024x820.png 1024w, https:\/\/blog.openshift.one\/wp-content\/uploads\/2023\/06\/Screenshot-2023-06-14-at-13.54.07-768x615.png 768w, https:\/\/blog.openshift.one\/wp-content\/uploads\/2023\/06\/Screenshot-2023-06-14-at-13.54.07-1536x1230.png 1536w\" sizes=\"auto, (max-width: 1626px) 100vw, 1626px\" \/><\/figure>\n\n\n\n<div style=\"height:25px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<h3 class=\"wp-block-heading\">Creating &#8220;MyApp&#8221; test workload<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">To have some fun with HPA let&#8217;s create an example application. To make things easier I created the <code>myapp<\/code> image from the following simple Containerfile:<\/p>\n\n\n\n<pre class=\"wp-block-code has-foreground-color has-tertiary-background-color has-text-color has-background\"><code>FROM fedora:38\nRUN dnf install -y stress-ng pv\nCMD &#91;\"\/usr\/bin\/sleep\", \"infinity\"]<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code has-background-color has-foreground-background-color has-text-color has-background\"><code>$ podman build --arch x86_64 -t default-route-openshift-image-registry.apps.ocp4.openshift.one:443\/rafal-hpa\/myapp:v1.0 .\nSTEP 1\/3: FROM fedora:38\nSTEP 2\/3: RUN dnf install -y stress-ng pv\nFedora 38 - x86_64                              4.5 MB\/s |  83 MB     00:18\nFedora 38 openh264 (From Cisco) - x86_64        1.3 kB\/s | 2.5 kB     00:01\nFedora Modular 38 - x86_64                      1.3 MB\/s | 2.8 MB     00:02\nFedora 38 - x86_64 - Updates                    2.1 MB\/s |  24 MB     00:11\nFedora Modular 38 - x86_64 - Updates            1.2 MB\/s | 2.1 MB     00:01\nLast metadata expiration check: 0:00:02 ago on Thu Jun 15 09:31:49 2023.\nDependencies resolved.\n================================================================================\n Package             Architecture  Version                  Repository     Size\n================================================================================\nInstalling:\n pv                  x86_64        1.6.20-6.fc38            fedora         66 k\n stress-ng           x86_64        0.15.06-1.fc38           fedora        2.4 M\nInstalling dependencies:\n Judy                x86_64        1.0.5-31.fc38            fedora        132 k\n libbsd              x86_64        0.11.7-4.fc38            fedora        112 k\n libmd               x86_64        1.0.4-3.fc38             fedora         39 k\n lksctp-tools        x86_64        1.0.19-3.fc38            fedora         92 k\n\nTransaction Summary\n================================================================================\nInstall  6 Packages\n\nTotal download size: 2.8 M\nInstalled size: 10 M\nDownloading Packages:\n(1\/6): libbsd-0.11.7-4.fc38.x86_64.rpm          634 kB\/s | 112 kB     00:00\n(2\/6): Judy-1.0.5-31.fc38.x86_64.rpm            414 kB\/s | 132 kB     00:00\n(3\/6): libmd-1.0.4-3.fc38.x86_64.rpm            117 kB\/s |  39 kB     00:00\n(4\/6): lksctp-tools-1.0.19-3.fc38.x86_64.rpm    612 kB\/s |  92 kB     00:00\n(5\/6): pv-1.6.20-6.fc38.x86_64.rpm              2.1 MB\/s |  66 kB     00:00\n(6\/6): stress-ng-0.15.06-1.fc38.x86_64.rpm      4.5 MB\/s | 2.4 MB     00:00\n--------------------------------------------------------------------------------\nTotal                                           1.8 MB\/s | 2.8 MB     00:01\nRunning transaction check\nTransaction check succeeded.\nRunning transaction test\nTransaction test succeeded.\nRunning transaction\n  Preparing        :                                                        1\/1\n  Installing       : lksctp-tools-1.0.19-3.fc38.x86_64                      1\/6\n  Installing       : libmd-1.0.4-3.fc38.x86_64                              2\/6\n  Installing       : libbsd-0.11.7-4.fc38.x86_64                            3\/6\n  Installing       : Judy-1.0.5-31.fc38.x86_64                              4\/6\n  Installing       : stress-ng-0.15.06-1.fc38.x86_64                        5\/6\n  Installing       : pv-1.6.20-6.fc38.x86_64                                6\/6\n  Running scriptlet: pv-1.6.20-6.fc38.x86_64                                6\/6\n  Verifying        : Judy-1.0.5-31.fc38.x86_64                              1\/6\n  Verifying        : libbsd-0.11.7-4.fc38.x86_64                            2\/6\n  Verifying        : libmd-1.0.4-3.fc38.x86_64                              3\/6\n  Verifying        : lksctp-tools-1.0.19-3.fc38.x86_64                      4\/6\n  Verifying        : pv-1.6.20-6.fc38.x86_64                                5\/6\n  Verifying        : stress-ng-0.15.06-1.fc38.x86_64                        6\/6\n\nInstalled:\n  Judy-1.0.5-31.fc38.x86_64          libbsd-0.11.7-4.fc38.x86_64\n  libmd-1.0.4-3.fc38.x86_64          lksctp-tools-1.0.19-3.fc38.x86_64\n  pv-1.6.20-6.fc38.x86_64            stress-ng-0.15.06-1.fc38.x86_64\n\nComplete!\n--&gt; cb48d092d119\nSTEP 3\/3: CMD &#91;\"\/usr\/bin\/sleep\", \"infinity\"]\nCOMMIT myapp:v1.0\n--&gt; af37513c7c54\nSuccessfully tagged localhost\/myapp:v1.0\naf37513c7c542a624b74f578b0ec3a54ac63b3e9b01e27019e8a67e718d2eb07<\/code><\/pre>\n\n\n\n<div style=\"height:25px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<p class=\"wp-block-paragraph\">Push it to my OpenShift&#8217;s internal registry:<\/p>\n\n\n\n<pre class=\"wp-block-code has-background-color has-foreground-background-color has-text-color has-background\"><code>$ podman push default-route-openshift-image-registry.apps.ocp4.openshift.one:443\/rafal-hpa\/myapp:v1.0\nGetting image source signatures\nCopying blob sha256:fb7b7e1a70dd14da904e1a241e8ed152ed9cc7153c9bd16f95db33e77891da6b\nCopying blob sha256:dda8af7b00b7fe3b1c22f7b09aace1a2d0a32018905f3beaaa53f45ad97a3646\nCopying config sha256:af37513c7c542a624b74f578b0ec3a54ac63b3e9b01e27019e8a67e718d2eb07\nWriting manifest to image destination\nStoring signatures<\/code><\/pre>\n\n\n\n<div style=\"height:25px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<p class=\"wp-block-paragraph\">So now I can create a Deployment from it:<\/p>\n\n\n\n<pre class=\"wp-block-code has-background-color has-foreground-background-color has-text-color has-background\"><code>$ oc new-app --image default-route-openshift-image-registry.apps.ocp4.openshift.one\/rafal-hpa\/myapp:v1.0 --name myapp --insecure-registry=true\n--&gt; Found container image af37513 (2 hours old) from default-route-openshift-image-registry.apps.ocp4.openshift.one for \"default-route-openshift-image-registry.apps.ocp4.openshift.one\/rafal-hpa\/myapp:v1.0\"\n\n    * An image stream tag will be created as \"myapp:v1.0\" that will track this image\n\n--&gt; Creating resources ...\n    deployment.apps \"myapp\" created\n--&gt; Success\n    Run 'oc status' to view your app.<\/code><\/pre>\n\n\n\n<div style=\"height:25px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<p class=\"wp-block-paragraph\">At this stage I got a single container running controlled by the Deployment<\/p>\n\n\n\n<pre class=\"wp-block-code has-background-color has-foreground-background-color has-text-color has-background\"><code>$ oc get pods,deployments\nNAME                         READY   STATUS    RESTARTS   AGE\npod\/myapp-77dbd8bc94-v2fjc   1\/1     Running   0          94s\n\nNAME                    READY   UP-TO-DATE   AVAILABLE   AGE\ndeployment.apps\/myapp   1\/1     1            1           95s<\/code><\/pre>\n\n\n\n<div style=\"height:25px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<p class=\"wp-block-paragraph\">Since it is very small and simple it does not use any significant amount of resources for now<\/p>\n\n\n\n<pre class=\"wp-block-code has-background-color has-foreground-background-color has-text-color has-background\"><code>$ oc adm top pods\nocNAME                     CPU(cores)   MEMORY(bytes)\nmyapp-77dbd8bc94-v2fjc   0m           0Mi\n\n$ oc describe podmetrics myapp-77dbd8bc94-v2fjc\nName:         myapp-77dbd8bc94-v2fjc\nNamespace:    rafal-hpa\nContainers:\n  Name:  myapp\n  Usage:\n    Cpu:     0\n    Memory:  328Ki\nKind:        PodMetrics\nEvents:                &lt;none&gt;<\/code><\/pre>\n\n\n\n<div style=\"height:25px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<h3 class=\"wp-block-heading\">Requests and Limits<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">For each compute resource, a container may specify a resource <strong>Request<\/strong> and <strong>Limit<\/strong>.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"360\" height=\"460\" src=\"https:\/\/blog.openshift.one\/wp-content\/uploads\/2023\/06\/Blank-diagram-1.png\" alt=\"\" class=\"wp-image-142\" srcset=\"https:\/\/blog.openshift.one\/wp-content\/uploads\/2023\/06\/Blank-diagram-1.png 360w, https:\/\/blog.openshift.one\/wp-content\/uploads\/2023\/06\/Blank-diagram-1-235x300.png 235w\" sizes=\"auto, (max-width: 360px) 100vw, 360px\" \/><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\"><br><strong>Request<\/strong> are being used during scheduling to find suitable compute node which can provide requested amount of resources (CPU, memory). They are also being used by HorizontalPodAutoscaler to calculate the current resource usage vs expected usage expressed in percents. Container can go above values described by <strong>Requests<\/strong> but their availability is not guaranteed, so it may happen it won&#8217;t be able to get more CPU or memory on the current node. <br><strong>Limits<\/strong> on the other hand specify the maximum amount of resources (CPU, memory) that container may consume. Container won&#8217;t be able to use more than <strong>Limit<\/strong> specifies.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">If one configures <strong>Limits<\/strong> but omits <strong>Requests<\/strong>, <strong>Request<\/strong> will be automatically configured with the <strong>Limit<\/strong> value.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>You have to set Requests if you want to configure HorizontalPodAutoscaler to scale your application based on the percentage of resource usage<\/strong>. It is not required though if you use exact value to describe memory or CPU usage, such like 500m (milicores) or 256Mi (megabinary).<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">More detailed definition of Requests and Limits can be found in this document: <a rel=\"noreferrer noopener\" href=\"https:\/\/docs.openshift.com\/container-platform\/4.13\/nodes\/clusters\/nodes-cluster-overcommit.html#nodes-cluster-overcommit-resource-requests_nodes-cluster-overcommit\" target=\"_blank\">Resource requests and overcommitment<\/a><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">For my example MyApp I will set requests to 500m of CPU and 128Mi of memory and set the limits to 1000m CPU and 256Mi of memory. These settings mean that scheduler will try to find a node which has at least 500m CPU and 128Mi of memory available, these values will be also taken into account in case of HPA configured to use percentage (%) of the requested values. Additionally my example MyApp will be capped at 1000m CPU (one core) and 256Mi of memory. <\/p>\n\n\n\n<pre class=\"wp-block-code has-background-color has-foreground-background-color has-text-color has-background\"><code>$ oc set resources deployment myapp --requests=cpu=500m,memory=128Mi --limits=cpu=1000m,memory=256Mi\ndeployment.apps\/myapp resource requirements updated<\/code><\/pre>\n\n\n\n<div style=\"height:25px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<h3 class=\"wp-block-heading\">Autoscaling based on CPU usage<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">OpenShift CLI (oc) comes with handy extension capable to set CPU based Horizontal pod auto-scaling straight from the command line (please note it is only capable now to configure CPU based HPA, no option for memory is available at the time of writing it where oc CLI version is 4.13.4):<\/p>\n\n\n\n<pre class=\"wp-block-code has-background-color has-foreground-background-color has-text-color has-background\"><code>oc autoscale (-f FILENAME | TYPE NAME | TYPE\/NAME) &#91;--min=MINPODS] --max=MAXPODS &#91;--cpu-percent=CPU] &#91;options]<\/code><\/pre>\n\n\n\n<div style=\"height:25px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<p class=\"wp-block-paragraph\">Having in mind that in the previous step I configured CPU requests for 100m cores I want to scale-out my deployment where the average usage of CPU across all running pods will go above 50% so 250m core. I also want to ensure there will be at least two replicas on my application running for availability reasons and it won&#8217;t go above 10 replicas.<\/p>\n\n\n\n<pre class=\"wp-block-code has-background-color has-foreground-background-color has-text-color has-background\"><code>$ oc autoscale deployment myapp --min=2 --max=10 --cpu-percent=50\nhorizontalpodautoscaler.autoscaling\/myapp autoscaled<\/code><\/pre>\n\n\n\n<div style=\"height:25px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<p class=\"wp-block-paragraph\">Immediately after running the command above OpenShift will start additional running copy of my pod to satisfy requirement of minimum 2 replicas running:<\/p>\n\n\n\n<pre class=\"wp-block-code has-background-color has-foreground-background-color has-text-color has-background\"><code>$ oc get pods\nNAME                     READY   STATUS    RESTARTS   AGE\nmyapp-6677fd6f55-pdq44   1\/1     Running   0          27s\nmyapp-66d77bbf56-5g9lj   1\/1     Running   0          20m<\/code><\/pre>\n\n\n\n<div style=\"height:25px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<p class=\"wp-block-paragraph\">After a short while it will also start monitoring CPU usage and report it under HPA object:<\/p>\n\n\n\n<pre class=\"wp-block-code has-background-color has-foreground-background-color has-text-color has-background\"><code>$ oc get hpa\nNAME    REFERENCE          TARGETS   MINPODS   MAXPODS   REPLICAS   AGE\nmyapp   Deployment\/myapp   0%\/50%    2         10        2          2m45s\n\n$ oc describe hpa\nName:                                                  myapp\nNamespace:                                             rafal-hpa\nLabels:                                                &lt;none&gt;\nAnnotations:                                           &lt;none&gt;\nCreationTimestamp:                                     Wed, 21 Jun 2023 14:58:20 +0200\nReference:                                             Deployment\/myapp\nMetrics:                                               ( current \/ target )\n  resource cpu on pods  (as a percentage of request):  0% (0) \/ 50%\nMin replicas:                                          2\nMax replicas:                                          10\nDeployment pods:                                       2 current \/ 2 desired\nConditions:\n  Type            Status  Reason               Message\n  ----            ------  ------               -------\n  AbleToScale     True    ScaleDownStabilized  recent recommendations were higher than current one, applying the highest recent recommendation\n  ScalingActive   True    ValidMetricFound     the HPA was able to successfully calculate a replica count from cpu resource utilization (percentage of request)\n  ScalingLimited  False   DesiredWithinRange   the desired count is within the acceptable range\nEvents:\n  Type    Reason             Age    From                       Message\n  ----    ------             ----   ----                       -------\n  Normal  SuccessfulRescale  2m32s  horizontal-pod-autoscaler  New size: 2; reason: Current number of replicas below Spec.MinReplicas<\/code><\/pre>\n\n\n\n<div style=\"height:25px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<p class=\"wp-block-paragraph\">Since the pod I run just sleeps (remember <code>CMD [\"\/usr\/bin\/sleep\", \"infinity\"]<\/code> from the Containerfile? What a life! \ud83d\ude42 ) it reports 0% of usage. Let&#8217;s put some load there to wake up HPA (please note change of the directory to \/tmp &#8211; stress-ng needs write permissions to the current directory):<\/p>\n\n\n\n<pre class=\"wp-block-code has-background-color has-foreground-background-color has-text-color has-background\"><code>$ oc rsh myapp-6677fd6f55-pdq44\nsh-5.2$ cd \/tmp\nsh-5.2$ stress-ng -c 1\nstress-ng: info:  &#91;19] defaulting to a 86400 second (1 day, 0.00 secs) run per stressor\nstress-ng: info:  &#91;19] dispatching hogs: 1 cpu<\/code><\/pre>\n\n\n\n<div style=\"height:25px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<p class=\"wp-block-paragraph\">In another terminal run oc get -w hpa to watch HPA status changes:<\/p>\n\n\n\n<pre class=\"wp-block-code has-background-color has-foreground-background-color has-text-color has-background\"><code>$ oc get hpa -w\nNAME    REFERENCE          TARGETS   MINPODS   MAXPODS   REPLICAS   AGE\nmyapp   Deployment\/myapp   0%\/50%    2         10        2          44m\nmyapp   Deployment\/myapp   60%\/50%   2         10        2          45m\nmyapp   Deployment\/myapp   60%\/50%   2         10        3          45m\nmyapp   Deployment\/myapp   40%\/50%    2         10        5          45m\nmyapp   Deployment\/myapp   46%\/50%    2         10        5          45m<\/code><\/pre>\n\n\n\n<div style=\"height:25px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<p class=\"wp-block-paragraph\">By default HPA has scale-down policy configured to wait 300 seconds before pods will be removed. This is just to avoid unnecessary ping-ping while adding and removing pods just because load fluctuates a bit. For demo purposes I modified this default policy and configured it to 15 seconds. Therefore I don&#8217;t have to wait too long once load decreases to see HPA removing the pods.<\/p>\n\n\n\n<pre class=\"wp-block-code has-background-color has-foreground-background-color has-text-color has-background\"><code>$ oc patch hpa myapp -p '{\"spec\": {\"behavior\": {\"scaleDown\": {\"stabilizationWindowSeconds\": 15 }}}}'\nhorizontalpodautoscaler.autoscaling\/myapp patched<\/code><\/pre>\n\n\n\n<div style=\"height:25px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<p class=\"wp-block-paragraph\">I cancelled stress-ng process started earlier so the load will go down and HPA remove all extra pods, keeping just 2 of them as requested.<\/p>\n\n\n\n<pre class=\"wp-block-code has-background-color has-foreground-background-color has-text-color has-background\"><code>myapp   Deployment\/myapp   48%\/50%    2         10        5          46m\nmyapp   Deployment\/myapp   35%\/50%    2         10        5          47m\nmyapp   Deployment\/myapp   0%\/50%     2         10        2          47m<\/code><\/pre>\n\n\n\n<div style=\"height:25px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<p class=\"wp-block-paragraph\">This concludes the CPU auto-scaling exercise. Please remember you can also track metrics and events using WebUI or <code>oc get events -w<\/code> command among the others.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"471\" src=\"https:\/\/blog.openshift.one\/wp-content\/uploads\/2023\/06\/Screenshot-2023-06-21-at-15.58.07-1024x471.png\" alt=\"\" class=\"wp-image-146\" srcset=\"https:\/\/blog.openshift.one\/wp-content\/uploads\/2023\/06\/Screenshot-2023-06-21-at-15.58.07-1024x471.png 1024w, https:\/\/blog.openshift.one\/wp-content\/uploads\/2023\/06\/Screenshot-2023-06-21-at-15.58.07-300x138.png 300w, https:\/\/blog.openshift.one\/wp-content\/uploads\/2023\/06\/Screenshot-2023-06-21-at-15.58.07-768x353.png 768w, https:\/\/blog.openshift.one\/wp-content\/uploads\/2023\/06\/Screenshot-2023-06-21-at-15.58.07-1536x707.png 1536w, https:\/\/blog.openshift.one\/wp-content\/uploads\/2023\/06\/Screenshot-2023-06-21-at-15.58.07-2048x942.png 2048w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<div style=\"height:25px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<h3 class=\"wp-block-heading\">Autoscaling based on Memory usage<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Autoscaling based on memory works in similar fashion as CPU based, however oc CLI tool does not provide option to set it up straight from the command line. Therefore my approach here is firstly create autoscaler with minimum and maximum number of pods and then edit it to add memory based scaling. For an instance:<\/p>\n\n\n\n<pre class=\"wp-block-code has-background-color has-foreground-background-color has-text-color has-background\"><code>$ oc autoscale deployment myapp --min=2 --max=10 -o yaml --dry-run=client\napiVersion: autoscaling\/v1\nkind: HorizontalPodAutoscaler\nmetadata:\n  creationTimestamp: null\n  name: myapp\nspec:\n  maxReplicas: 10\n  minReplicas: 2\n  scaleTargetRef:\n    apiVersion: apps\/v1\n    kind: Deployment\n    name: myapp\nstatus:\n  currentReplicas: 0\n  desiredReplicas: 0<\/code><\/pre>\n\n\n\n<div style=\"height:25px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<p class=\"wp-block-paragraph\">and change it by adding spec.metrics section as bellow:<\/p>\n\n\n\n<pre class=\"wp-block-code has-foreground-color has-tertiary-background-color has-text-color has-background\"><code>apiVersion: autoscaling\/v1\nkind: HorizontalPodAutoscaler\nmetadata:\n  creationTimestamp: null\n  name: myapp\nspec:\n  maxReplicas: 10\n  minReplicas: 2\n  scaleTargetRef:\n    apiVersion: apps\/v1\n    kind: Deployment\n    name: myapp\n<strong>  metrics:\n  - resource:\n      name: memory\n      target:\n        averageUtilization: 50\n        type: Utilization\n    type: Resource<\/strong><\/code><\/pre>\n\n\n\n<div style=\"height:25px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<p class=\"wp-block-paragraph\">The above HPA example will trigger scale-out action of <code>myapp<\/code> Deployment if an average memory utilisation across all pods being managed by the deployment will go above 50% of requested memory. Please remember to set memory requests accordingly. For this example I set it the same way as before:<\/p>\n\n\n\n<pre class=\"wp-block-code has-background-color has-foreground-background-color has-text-color has-background\"><code>$ oc set resources deployment myapp --requests=cpu=500m,memory=128Mi --limits=cpu=1000m,memory=256Mi\ndeployment.apps\/myapp resource requirements updated<\/code><\/pre>\n\n\n\n<div style=\"height:25px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<p class=\"wp-block-paragraph\">Let&#8217;s give it a try and allocate 200Mi of memory on one of the pods within myapp Deployment for 5 minutes:<\/p>\n\n\n\n<pre class=\"wp-block-code has-background-color has-foreground-background-color has-text-color has-background\"><code>$ oc get pods\nNAME                     READY   STATUS    RESTARTS   AGE\nmyapp-6677fd6f55-5622v   1\/1     Running   0          7m\nmyapp-6677fd6f55-hxsrn   1\/1     Running   0          7m\n$ oc rsh myapp-6677fd6f55-5622v\nsh-5.2$ cat &lt;( &lt;\/dev\/zero head -c 200m) &lt;(sleep 300) | tail\n<\/code><\/pre>\n\n\n\n<div style=\"height:25px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<p class=\"wp-block-paragraph\">If you&#8217;re still watching HPA you should be able to observe it notices increase of memory usage and scale out Deployment by adding additional pods either to the upper limit or until average memory usage drops bellow 50% of the request, then after 300 seconds (5 minutes) when memory usage drops it automatically scale-in Deployment by reducing number of pods (replicas).<\/p>\n\n\n\n<pre class=\"wp-block-code has-background-color has-foreground-background-color has-text-color has-background\"><code>$ oc get hpa myapp -w\nNAME    REFERENCE          TARGETS   MINPODS   MAXPODS   REPLICAS   AGE\nmyapp   Deployment\/myapp   0%\/50%    2         10        2          10m\nmyapp   Deployment\/myapp   35%\/50%   2         10        2          11m\nmyapp   Deployment\/myapp   79%\/50%   2         10        2          11m\nmyapp   Deployment\/myapp   79%\/50%   2         10        4          12m\nmyapp   Deployment\/myapp   53%\/50%   2         10        4          12m\nmyapp   Deployment\/myapp   40%\/50%   2         10        4          13m\nmyapp   Deployment\/myapp   40%\/50%   2         10        4          14m\nmyapp   Deployment\/myapp   0%\/50%    2         10        4          15m\nmyapp   Deployment\/myapp   0%\/50%    2         10        2          15m<\/code><\/pre>\n\n\n\n<div style=\"height:25px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<h3 class=\"wp-block-heading\">How does HorizontalPodAutoscaler calculate % of resource usage?<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Until now we&#8217;ve been using percentage based resource usage against CPU and memory requests but it wasn&#8217;t explain yet how does it calculate it. Here is the rule:<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><code>$TOTAL_USED \/ $REQUESTED \/ $NUM_OF_PODS = used_resources%<\/code><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">For an instance in the last example I configured memory request to 128Mi, number of replicas to 2 and then allocated 200Mi of memory, therefore:<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><code>200 \/ 128 \/ 2 = .78<\/code><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">and that gave us 79% of memory usage (including memory used by &#8220;sleeping&#8221; pods). The same rule applies for CPU usage calculations.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">How does HorizontalPodAutoscaler calculate number of required pods?<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">To calculate the ratio how many pods there should be running to meet HPA configuration criteria, the following rule is being used:<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">new_<code>number_of_pods = ceil( current_number_of_pods * ( currentMetricValue \/ desiredMetricValue ) )<\/code><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">where <code>currentMetricValue<\/code> in case of <code>averageUtilization<\/code> being used is calculated as an average resource usage across all the pods. To put it into example:<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Two pods running, 200Mi memory being used, request is for 128Mi and threshold is set to 50% give us:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>current_number_of_pods:<code> 2<\/code><\/li>\n\n\n\n<li>currentMetricValue: <code>200 \/ 2 = 100<\/code><\/li>\n\n\n\n<li>desiredMetricValue: <code>128 * 0.5 = 64<\/code><\/li>\n\n\n\n<li>new_number_of_pods: <code>ceil( 2 * ( 100 \/ 64 ) ) = ceil( 3.125 ) = 4<\/code><\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">Therefore to address the HPA configuration requirement of 50% of requested memory being used on average, the HPA should scale-out the deployment to 4 total pods.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Once it is scaled out it looks as follows:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>current_number_of_pods:<code> 4<\/code><\/li>\n\n\n\n<li>currentMetricValue: <code>200 \/ 4 = 50<\/code><\/li>\n\n\n\n<li>desiredMetricValue: <code>128 * 0.5 = 64<\/code><\/li>\n\n\n\n<li>new_number_of_pods: <code>ceil( 4 * ( 50 \/ 64 ) ) = ceil( 3.125 ) = 4<\/code><\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">So there is no need to scale out or scale-in since the number of pods is right to address 50% of 128Mi being used on average.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">If the memory usage drops at some point to, let say, 100Mi:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>current_number_of_pods:<code> 4<\/code><\/li>\n\n\n\n<li>currentMetricValue: <code>100 \/ 4 = 25<\/code><\/li>\n\n\n\n<li>desiredMetricValue: <code>128 * 0.5 = 64<\/code><\/li>\n\n\n\n<li>new_number_of_pods: <code>ceil( 4 * ( 25 \/ 64 ) ) = ceil( 1.5625 ) = <\/code>2<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">Therefore deployment can be scaled-in back to 2 pods.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n","protected":false},"excerpt":{"rendered":"<p>OpenShift allows you to run &#8211; among the Virtual Machines &#8211; containerised workloads. One of the biggest benefit of containers is the ability to turn large projects into small containers (microservices), what enables you to develop and manage them independently from each other. One of the most important aspects of service management is to scale [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[6],"tags":[],"class_list":["post-127","post","type-post","status-publish","format-standard","hentry","category-openshift"],"_links":{"self":[{"href":"https:\/\/blog.openshift.one\/index.php\/wp-json\/wp\/v2\/posts\/127","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blog.openshift.one\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.openshift.one\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.openshift.one\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.openshift.one\/index.php\/wp-json\/wp\/v2\/comments?post=127"}],"version-history":[{"count":21,"href":"https:\/\/blog.openshift.one\/index.php\/wp-json\/wp\/v2\/posts\/127\/revisions"}],"predecessor-version":[{"id":164,"href":"https:\/\/blog.openshift.one\/index.php\/wp-json\/wp\/v2\/posts\/127\/revisions\/164"}],"wp:attachment":[{"href":"https:\/\/blog.openshift.one\/index.php\/wp-json\/wp\/v2\/media?parent=127"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.openshift.one\/index.php\/wp-json\/wp\/v2\/categories?post=127"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.openshift.one\/index.php\/wp-json\/wp\/v2\/tags?post=127"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}