Efficient Pod Scaling with Kubernetes Horizontal Pod Autoscaler (HPA)

Autoscaling is one of Kubernetes’ core features, providing the ability to dynamically adjust the number of running pods in a cluster based on demand. This feature, called the Horizontal Pod Autoscaler (HPA), is essential for maintaining performance and efficiency in a Kubernetes environment by automatically scaling your pods to handle increased load or scale down when the demand decreases.

Table of Contents

What is HPA?

The Horizontal Pod Autoscaler adjusts the number of pods in a deployment, statefulset, or replica set based on observed metrics like average CPU utilization, memory usage, or custom metrics you define. This flexibility allows for both efficient resource usage and optimal application performance.

Note: HPA does not support scaling objects that are not scalable, such as DaemonSets.

Creating an HPA

You can easily set up HPA using kubectl with the autoscale command. Below is an example of scaling a web application based on CPU utilization:

kubectl autoscale deployment webapp \
    --cpu-percent=70 \
    --min=1 \
    --max=5

In this example:

The HPA scales the webapp deployment.
The scaling triggers when average CPU utilization exceeds 70%.
The number of pods will vary between 1 and 5 depending on the load.

You can verify the HPA status using the kubectl get hpa command:

kubectl get hpa

The output will display something like this:

NAME     REFERENCE           TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
webapp   Deployment/webapp   0%/70%    1         5         1          110s

Declarative HPA with YAML

You can also define HPA declaratively using YAML files. Below is an example:

apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
  name: webapp-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: webapp
  minReplicas: 1
  maxReplicas: 5
  targetCPUUtilizationPercentage: 70

To apply this configuration, use the following command:

kubectl apply -f webapp-hpa.yaml

Testing the Autoscaler

You can simulate load on the application to observe how HPA reacts. Use the following command to generate load:

kubectl run -i --tty load-generator --rm \
  --image=busybox:1.28 \
  --restart=Never \
  -- /bin/sh -c "while sleep 0.01; do wget -q -O- http://webapp; done"

Monitor the HPA as it adjusts to the increased load:

kubectl get hpa --watch

The output will display how the HPA adjusts pod counts based on CPU usage over time:

NAME     REFERENCE           TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
webapp   Deployment/webapp   249%/70%  1         5         4          57m
webapp   Deployment/webapp   0%/70%    1         5         1          65m

Autoscaling with Multiple Metrics

With the autoscaling/v2 API version, Kubernetes allows you to scale based on multiple metrics, including both CPU and memory:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: webapp-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: webapp
  minReplicas: 1
  maxReplicas: 5
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageValue: 10Mi

This setup scales the webapp deployment based on both CPU and memory usage. You can also define custom metrics for more advanced scaling.

Custom Metrics

In addition to CPU and memory, Kubernetes HPA supports custom metrics such as pod and object metrics. Custom metrics allow scaling based on more application-specific measurements.

Pod Metrics: Scale based on metrics like network traffic or requests per second for individual pods:

type: Pods
pods:
  metric:
    name: packets-per-second
  target:
    type: AverageValue
    averageValue: 2k

Object Metrics: Scale based on metrics tied to Kubernetes objects, such as ingress traffic:

type: Object
object:
  metric:
    name: requests-per-second
  describedObject:
    apiVersion: networking.k8s.io/v1
    kind: Ingress
    name: main-route
  target:
    type: Value
    value: 2k

Conclusion

Horizontal Pod Autoscaler (HPA) is an essential tool in Kubernetes for ensuring your applications scale dynamically based on demand. With support for multiple resource metrics and custom metrics, HPA can optimize resource allocation and improve application resilience under fluctuating loads. Using the latest features in Kubernetes autoscaling, you can scale more intelligently and efficiently than ever before.

Latest Updates in Kubernetes Autoscaling

autoscaling/v2 API: Introduced multiple metric support (both resource and custom metrics).
Custom Metrics API: Expanded to allow pod and object-level metrics for scaling.
Kubernetes 1.27 Update: Enhanced stability and scaling performance for large workloads.