Autoscaling is one of Kubernetes’ core features, providing the ability to dynamically adjust the number of running pods in a cluster based on demand. This feature, called the Horizontal Pod Autoscaler (HPA), is essential for maintaining performance and efficiency in a Kubernetes environment by automatically scaling your pods to handle increased load or scale down when the demand decreases.
What is HPA?
The Horizontal Pod Autoscaler adjusts the number of pods in a deployment, statefulset, or replica set based on observed metrics like average CPU utilization, memory usage, or custom metrics you define. This flexibility allows for both efficient resource usage and optimal application performance.
Note: HPA does not support scaling objects that are not scalable, such as DaemonSets.
Creating an HPA
You can easily set up HPA using kubectl
with the autoscale
command. Below is an example of scaling a web application based on CPU utilization:
kubectl autoscale deployment webapp \
--cpu-percent=70 \
--min=1 \
--max=5
In this example:
- The HPA scales the webapp deployment.
- The scaling triggers when average CPU utilization exceeds 70%.
- The number of pods will vary between 1 and 5 depending on the load.
You can verify the HPA status using the kubectl get hpa
command:
kubectl get hpa
The output will display something like this:
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
webapp Deployment/webapp 0%/70% 1 5 1 110s
Declarative HPA with YAML
You can also define HPA declaratively using YAML files. Below is an example:
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
name: webapp-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: webapp
minReplicas: 1
maxReplicas: 5
targetCPUUtilizationPercentage: 70
To apply this configuration, use the following command:
kubectl apply -f webapp-hpa.yaml
Testing the Autoscaler
You can simulate load on the application to observe how HPA reacts. Use the following command to generate load:
kubectl run -i --tty load-generator --rm \
--image=busybox:1.28 \
--restart=Never \
-- /bin/sh -c "while sleep 0.01; do wget -q -O- http://webapp; done"
Monitor the HPA as it adjusts to the increased load:
kubectl get hpa --watch
The output will display how the HPA adjusts pod counts based on CPU usage over time:
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
webapp Deployment/webapp 249%/70% 1 5 4 57m
webapp Deployment/webapp 0%/70% 1 5 1 65m
Autoscaling with Multiple Metrics
With the autoscaling/v2
API version, Kubernetes allows you to scale based on multiple metrics, including both CPU and memory:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: webapp-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: webapp
minReplicas: 1
maxReplicas: 5
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageValue: 10Mi
This setup scales the webapp deployment based on both CPU and memory usage. You can also define custom metrics for more advanced scaling.
Custom Metrics
In addition to CPU and memory, Kubernetes HPA supports custom metrics such as pod and object metrics. Custom metrics allow scaling based on more application-specific measurements.
- Pod Metrics: Scale based on metrics like network traffic or requests per second for individual pods:
type: Pods
pods:
metric:
name: packets-per-second
target:
type: AverageValue
averageValue: 2k
- Object Metrics: Scale based on metrics tied to Kubernetes objects, such as ingress traffic:
type: Object
object:
metric:
name: requests-per-second
describedObject:
apiVersion: networking.k8s.io/v1
kind: Ingress
name: main-route
target:
type: Value
value: 2k
Conclusion
Horizontal Pod Autoscaler (HPA) is an essential tool in Kubernetes for ensuring your applications scale dynamically based on demand. With support for multiple resource metrics and custom metrics, HPA can optimize resource allocation and improve application resilience under fluctuating loads. Using the latest features in Kubernetes autoscaling, you can scale more intelligently and efficiently than ever before.
Latest Updates in Kubernetes Autoscaling
- autoscaling/v2 API: Introduced multiple metric support (both resource and custom metrics).
- Custom Metrics API: Expanded to allow pod and object-level metrics for scaling.
- Kubernetes 1.27 Update: Enhanced stability and scaling performance for large workloads.