How to Enable CPU and Memory HPA in Kubernetes

Learn how to enable CPU and Memory HPA (Horizontal Pod Autoscaler) in Kubernetes to automatically scale your deployments based on resource utilization. This step-by-step guide includes Metrics Server installation, deployment creation, and HPA configuration.

Horizontal Pod Autoscaler (HPA) is a critical component in Kubernetes that automatically scales the number of pods in a replication controller, deployment, or replica set based on observed CPU utilization or other custom metrics. In this article, we will discuss how to enable CPU and memory-based HPA in Kubernetes.

Table of Contents

Prerequisites

Before proceeding, ensure that you have the following prerequisites:

A Kubernetes cluster is up and running.
kubectl the command-line tool installed and configured to interact with your cluster.

Enabling Metrics Server

To enable HPA based on CPU and memory metrics, you must first deploy the Metrics Server in your cluster. Metrics Server collects resource metrics from Kubelets and exposes them via the Kubernetes API.

Deploy Metrics Server

To deploy Metrics Server, apply the following YAML manifest:

https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

Verify Metrics Server Installation

To verify that Metrics Server is running, execute the following command:

kubectl get deployment metrics-server -n kube-system

Creating a Deployment

To demonstrate the HPA functionality, we will create a simple deployment using the nginx image.

Create a Deployment YAML

Create a file named nginx-deployment.yaml with the following content:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:1.21.0
        resources:
          limits:
            cpu: 200m
            memory: 256Mi
          requests:
            cpu: 100m
            memory: 128Mi

Apply the Deployment

Apply the deployment using the following command:

kubectl apply -f nginx-deployment.yaml

Enable CPU and Memory HPA

Now that we have our deployment up and running, let’s create an HPA configuration to scale based on CPU and memory.

Create an HPA YAML

Create a file named nginx-hpa.yaml with the following content:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: nginx-hpa
spec:
  maxReplicas: 10
  metrics:
  - resource:
      name: cpu
      target:
        averageUtilization: 70
        type: Utilization
    type: Resource
  - resource:
      name: memory
      target:
        type: AverageValue
        averageValue: 128Mi
    type: Resource
  minReplicas: 1
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: nginx-deployment

HPA Use Cases

Few use cases that demonstrate when and how CPU and memory-based HPA can be beneficial in Kubernetes:

Use Case	Description
E-commerce website	During peak times like sales events or holidays, an e-commerce website might experience a surge in traffic, necessitating more resources to handle requests.
Media streaming service	A media streaming service needs to scale up when there is an increase in concurrent users streaming content to maintain seamless performance.
Data processing pipeline	Data processing pipelines may require additional resources when processing large volumes of data, especially during peak data ingestion periods.
Multi-tenant applications	In a multi-tenant application, the varying load from different tenants may require dynamic scaling based on CPU and memory utilization.
Online gaming platform	An online gaming platform may experience fluctuations in user count throughout the day, making it essential to scale up or down based on resource usage.
Microservices architecture	In a microservices-based system, each service might require dynamic scaling based on the workload, ensuring efficient resource allocation and usage.

How CPU and memory-based HPA can be beneficial in Kubernetes

These use cases illustrate how the HPA can automatically scale the number of pods in a deployment based on CPU and memory utilization, ensuring optimal performance and efficient resource usage in Kubernetes.