KEDA Tutorial: Event-Driven Autoscaling in Kubernetes

Q: How is KEDA different from the HorizontalPodAutoscaler?

The standard HPA scales from CPU and memory metrics. KEDA connects to external event sources like Kafka, SQS, RabbitMQ, Redis, and others, then scales workloads based on those metrics.

Q: Can KEDA scale to zero?

Yes. KEDA can scale a workload from zero to one replica when events arrive and back to zero when the event source is empty.

Q: What scalers does KEDA support?

KEDA supports over 65 scalers, including Kafka, RabbitMQ, AWS SQS, AWS Kinesis, Azure Service Bus, GCP Pub/Sub, Redis, NATS, Prometheus, databases, Elasticsearch, and Cron.

Q: Does KEDA replace the HPA?

No. KEDA works alongside the HPA. It creates and manages an HPA backed by external metrics for event-driven scaling.

Q: What is the difference between ScaledObject and ScaledJob?

A ScaledObject scales a long-running workload by changing replica count. A ScaledJob creates Kubernetes Jobs in response to events.

Q: Is KEDA production-ready?

Yes. KEDA is a CNCF graduated project and is used in production by many organizations.

Q: How do I debug KEDA when scaling is not working?

Start with kubectl describe scaledobject, check the KEDA operator logs, verify TriggerAuthentication namespace and credentials, and test connectivity to the event source from inside the cluster.

The Horizontal Pod Autoscaler is useful, but it has a fundamental constraint: it scales based on CPU and memory. That works fine for stateless web services with predictable load patterns. It does not work well for workloads that are driven by events - a consumer processing messages from a queue, a batch job triggered by a file arriving in S3, or a worker that only needs to run when a Kafka topic has lag.

For these workloads, what you actually want is to scale based on the event source itself. Scale up when the queue has 1000 messages. Scale down when it is empty. Scale to zero when there is nothing to process.

That is exactly what KEDA does.

KEDA, Kubernetes Event-Driven Autoscaling, is a CNCF graduated project that extends Kubernetes with event-driven scaling. It supports over 65 scalers covering the most common event sources: Kafka, RabbitMQ, AWS SQS, Azure Service Bus, Redis, Prometheus metrics, cron schedules, and many more. This tutorial covers how KEDA works, how to install it, and how to configure it for the most common real-world use cases.

How KEDA Works

Before writing any configuration, it helps to understand KEDA's architecture so the YAML you write later makes sense.

KEDA installs four components into your cluster:

The Operator watches for ScaledObject and ScaledJob custom resources. When it finds one, it creates a HorizontalPodAutoscaler backed by external metrics and starts the scaling loop.

The Metrics Server is an implementation of the Kubernetes External Metrics API. It queries your configured event sources, the scalers, and exposes the results as metrics that the HPA can act on.

The Scaler is the connector to a specific event source. Each scaler knows how to talk to its source, whether that means polling an SQS queue, reading Kafka consumer group lag, or checking a Redis list length, and returns a metric value that represents the current load.

Admission Webhooks validate your ScaledObject configurations before they are applied, preventing common mistakes like targeting a Deployment that does not exist or configuring conflicting ScaledObjects.

The scaling loop works like this:

KEDA polls the event source on a configurable interval, defaulting to 30 seconds
The scaler returns a metric value, such as queue depth = 450 messages
KEDA compares this to your target threshold, such as 100 messages per pod
The HPA calculates the required replica count, such as 450 / 100 = 5 pods
The Deployment is scaled to 5 replicas

The key insight is that KEDA handles the path from zero to one replica and back. The standard HPA cannot scale from zero because a Deployment with zero pods has no CPU or memory metrics to act on. KEDA solves this by directly managing that zero-to-one transition based on the event source, then handing off to the HPA for further scaling.

Installing KEDA

KEDA is installed via Helm. This is the recommended approach and the one that gets security updates automatically.

Add the KEDA Helm repository:

helm repo add kedacore https://kedacore.github.io/charts
helm repo update

Install KEDA into its own namespace:

helm install keda kedacore/keda \
  --namespace keda \
  --create-namespace

Verify the installation:

kubectl get pods -n keda

You should see three pods running: keda-operator, keda-operator-metrics-apiserver, and keda-admission-webhooks.

Check that the CRDs were installed:

kubectl get crd | grep keda

You should see scaledobjects.keda.sh, scaledjobs.keda.sh, and triggerauthentications.keda.sh.

Core Concepts

ScaledObject

A ScaledObject is the main resource you create to enable KEDA scaling on a Deployment, StatefulSet, or any scalable workload. It maps a trigger, or event source, to a target workload and defines the scaling behavior.

The minimal structure looks like this:

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: my-scaler
  namespace: default
spec:
  scaleTargetRef:
    name: my-deployment
  minReplicaCount: 0
  maxReplicaCount: 20
  pollingInterval: 30
  cooldownPeriod: 300
  triggers:
    - type: rabbitmq
      metadata:
        queueName: my-queue
        queueLength: "10"

The pollingInterval controls how often KEDA checks the event source. Lower values mean faster scale-up response but more API calls to your event source. cooldownPeriod controls how long KEDA waits after the queue drains before scaling back to zero. This prevents thrashing if messages arrive in short bursts.

ScaledJob

A ScaledJob is for workloads that run as Kubernetes Jobs rather than long-running Deployments. Instead of scaling replica counts, KEDA creates new Job instances for each batch of work to process.

This is the right model for workloads where each unit of work is a discrete task: processing a file, running a report, or sending a batch of emails. Each Job runs to completion and terminates, rather than staying alive to process items from a queue continuously.

apiVersion: keda.sh/v1alpha1
kind: ScaledJob
metadata:
  name: my-job-scaler
  namespace: default
spec:
  jobTargetRef:
    template:
      spec:
        containers:
          - name: worker
            image: my-worker:latest
        restartPolicy: Never
  triggers:
    - type: rabbitmq
      metadata:
        queueName: tasks
        queueLength: "1"

TriggerAuthentication

Most scalers need credentials to connect to your event source: a connection string, API key, or cloud credentials. TriggerAuthentication is the secure way to provide these.

apiVersion: keda.sh/v1alpha1
kind: TriggerAuthentication
metadata:
  name: rabbitmq-auth
  namespace: default
spec:
  secretTargetRef:
    - parameter: host
      name: rabbitmq-secret
      key: connection-string

You then reference this in your ScaledObject:

triggers:
  - type: rabbitmq
    authenticationRef:
      name: rabbitmq-auth
    metadata:
      queueName: my-queue
      queueLength: "10"

This keeps credentials out of your ScaledObject manifests and in Kubernetes Secrets where they belong.

Example 1: Scaling on RabbitMQ Queue Depth

This is the classic KEDA use case. You have a consumer Deployment processing messages from a RabbitMQ queue and you want it to scale based on how many messages are waiting.

First, create the secret with your RabbitMQ connection string:

kubectl create secret generic rabbitmq-secret \
  --from-literal=connection-string="amqp://user:password@rabbitmq.default.svc.cluster.local:5672"

Create the TriggerAuthentication:

apiVersion: keda.sh/v1alpha1
kind: TriggerAuthentication
metadata:
  name: rabbitmq-auth
  namespace: default
spec:
  secretTargetRef:
    - parameter: host
      name: rabbitmq-secret
      key: connection-string

Create the ScaledObject:

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: rabbitmq-consumer-scaler
  namespace: default
spec:
  scaleTargetRef:
    name: rabbitmq-consumer
  minReplicaCount: 0
  maxReplicaCount: 30
  pollingInterval: 15
  cooldownPeriod: 60
  triggers:
    - type: rabbitmq
      authenticationRef:
        name: rabbitmq-auth
      metadata:
        protocol: amqp
        queueName: order-processing
        mode: QueueLength
        value: "10"

With this configuration, your rabbitmq-consumer Deployment will:

Stay at zero replicas when the queue is empty
Scale to 1 pod when messages start arriving
Scale up to 1 pod per 10 messages in the queue, up to 30 pods maximum
Scale back to zero 60 seconds after the queue drains

To verify KEDA is working, check the ScaledObject status:

kubectl get scaledobject rabbitmq-consumer-scaler
kubectl describe scaledobject rabbitmq-consumer-scaler

You should see the current metric value and the last scaling decision in the output.

Example 2: Scaling on AWS SQS

For AWS workloads, KEDA integrates with SQS using either an access key or IRSA, IAM Roles for Service Accounts, which is the recommended approach for EKS clusters.

Using IRSA:

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: sqs-consumer-scaler
  namespace: default
spec:
  scaleTargetRef:
    name: sqs-worker
  minReplicaCount: 0
  maxReplicaCount: 50
  pollingInterval: 30
  cooldownPeriod: 300
  triggers:
    - type: aws-sqs-queue
      metadata:
        queueURL: https://sqs.us-east-1.amazonaws.com/123456789/my-queue
        queueLength: "5"
        awsRegion: us-east-1
        identityOwner: operator

Using explicit credentials for non-EKS or development environments:

kubectl create secret generic aws-credentials \
  --from-literal=AWS_ACCESS_KEY_ID=your-key-id \
  --from-literal=AWS_SECRET_ACCESS_KEY=your-secret-key

apiVersion: keda.sh/v1alpha1
kind: TriggerAuthentication
metadata:
  name: aws-auth
spec:
  secretTargetRef:
    - parameter: awsAccessKeyID
      name: aws-credentials
      key: AWS_ACCESS_KEY_ID
    - parameter: awsSecretAccessKey
      name: aws-credentials
      key: AWS_SECRET_ACCESS_KEY
---
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: sqs-consumer-scaler
spec:
  scaleTargetRef:
    name: sqs-worker
  minReplicaCount: 0
  maxReplicaCount: 50
  pollingInterval: 30
  cooldownPeriod: 300
  triggers:
    - type: aws-sqs-queue
      authenticationRef:
        name: aws-auth
      metadata:
        queueURL: https://sqs.us-east-1.amazonaws.com/123456789/my-queue
        queueLength: "5"
        awsRegion: us-east-1

Example 3: Scaling on Kafka Consumer Lag

Kafka-based scaling is one of the most common KEDA use cases in data engineering teams. The scaler monitors consumer group lag: the difference between the latest offset in the topic and the offset your consumer group has reached.

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: kafka-consumer-scaler
  namespace: default
spec:
  scaleTargetRef:
    name: kafka-consumer
  minReplicaCount: 1
  maxReplicaCount: 20
  pollingInterval: 15
  triggers:
    - type: kafka
      metadata:
        bootstrapServers: kafka.default.svc.cluster.local:9092
        consumerGroup: my-consumer-group
        topic: events
        lagThreshold: "100"
        offsetResetPolicy: latest

Note that minReplicaCount is set to 1 here rather than 0. Scaling a Kafka consumer to zero is possible but requires careful handling of partition rebalancing. For most production use cases, keeping at least one replica running is safer and avoids warm-up latency when the first message arrives.

If your Kafka cluster requires authentication:

apiVersion: keda.sh/v1alpha1
kind: TriggerAuthentication
metadata:
  name: kafka-auth
spec:
  secretTargetRef:
    - parameter: sasl
      name: kafka-secret
      key: sasl-mechanism
    - parameter: username
      name: kafka-secret
      key: username
    - parameter: password
      name: kafka-secret
      key: password
    - parameter: tls
      name: kafka-secret
      key: tls-enabled

Example 4: Cron-Based Scaling

Not all scaling needs to be reactive. Some workloads have predictable load patterns: a reporting job that runs at 9am, a batch process at midnight, or a customer-facing service that needs extra capacity during business hours.

KEDA's cron scaler handles this:

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: business-hours-scaler
  namespace: default
spec:
  scaleTargetRef:
    name: web-api
  triggers:
    - type: cron
      metadata:
        timezone: Europe/London
        start: "0 8 * * 1-5"
        end: "0 19 * * 1-5"
        desiredReplicas: "10"

Outside the cron window, the Deployment scales back to its minimum replica count. You can combine the cron scaler with other scalers using multiple triggers. KEDA takes the highest metric value across all triggers, so the cron scaler acts as a floor during business hours while still allowing reactive scaling beyond 10 replicas if load increases further.

Combining Multiple Triggers

KEDA supports multiple triggers on a single ScaledObject. This is useful for workloads that need to respond to several independent signals.

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: multi-trigger-scaler
  namespace: default
spec:
  scaleTargetRef:
    name: worker
  minReplicaCount: 0
  maxReplicaCount: 50
  triggers:
    - type: rabbitmq
      authenticationRef:
        name: rabbitmq-auth
      metadata:
        queueName: high-priority
        queueLength: "5"
    - type: rabbitmq
      authenticationRef:
        name: rabbitmq-auth
      metadata:
        queueName: low-priority
        queueLength: "20"
    - type: cron
      metadata:
        timezone: UTC
        start: "0 6 * * *"
        end: "0 22 * * *"
        desiredReplicas: "2"

KEDA evaluates all triggers and uses the highest resulting replica count. In this example, the worker scales based on whichever of the two queues is generating more demand, and stays at a minimum of 2 replicas during the cron window.

Scaling to Zero in Practice

Scale-to-zero is one of KEDA's most useful features and one of the most misunderstood. A few things to know before enabling it in production:

Cold start latency is real. When a Deployment scales from zero, there is a lag between the first message arriving and a pod being ready to process it. For stateless workloads on fast-starting images this might be a few seconds. For JVM applications or large containers it can be 30-60 seconds or more. Messages will queue up during this time, which is usually fine for async workloads but may not be acceptable for latency-sensitive ones.

The cooldown period matters. Set it based on your traffic patterns. If you regularly see short bursts of messages with quiet periods in between, a high cooldown period, 300-600 seconds, prevents the Deployment from scaling to zero and back repeatedly. If your workload is genuinely idle for hours between runs, a short cooldown is fine.

Graceful shutdown is essential. Make sure your consumer pods handle SIGTERM correctly. They should finish processing the current message before exiting, not drop it. Kubernetes sends SIGTERM when scaling down, so your application needs to catch it and drain any in-flight work.

Monitoring KEDA

Check the status of your ScaledObjects:

kubectl get scaledobjects -A

Get detailed status including current metric value and scaling decisions:

kubectl describe scaledobject my-scaler -n default

Check KEDA operator logs for troubleshooting:

kubectl logs -n keda deployment/keda-operator -f

Check the metrics server for the raw metric values:

kubectl logs -n keda deployment/keda-operator-metrics-apiserver -f

KEDA exposes Prometheus metrics on port 8080 of the operator pod. If you have Prometheus running, scrape these for operational visibility into KEDA's scaling decisions, error rates, and scaler latency.

Common Mistakes and How to Avoid Them

Forgetting namespace on TriggerAuthentication. A TriggerAuthentication in namespace A cannot be referenced by a ScaledObject in namespace B. Make sure they are in the same namespace, or use a ClusterTriggerAuthentication for cross-namespace authentication.

Setting pollingInterval too low. Polling every 5 seconds across 20 ScaledObjects means 240 requests per minute to your event source. For SQS this costs money. For RabbitMQ it creates unnecessary load. Start with 30 seconds and only reduce it if you genuinely need faster scale-up response.

Not setting maxReplicaCount. Without a ceiling, a sudden spike in queue depth could trigger an extremely large scale-up event. Always set a sensible maximum.

Scaling Kafka consumers below partition count. If your topic has 10 partitions and you scale to 15 consumers, the extra 5 consumers sit idle. Scale no higher than your partition count unless you are processing from multiple topics.

Ignoring cooldownPeriod for batch workloads. If your ScaledJob processes one message per job and sets cooldownPeriod too low, Kubernetes will create and destroy Job objects rapidly, increasing API server load. Set cooldownPeriod based on how long your Jobs typically take to run.

Frequently Asked Questions

What is KEDA?

KEDA, Kubernetes Event-Driven Autoscaling, is an open source CNCF graduated project that enables event-driven autoscaling in Kubernetes. It allows workloads to scale based on external event sources like message queues, streaming platforms, and databases, rather than only on CPU and memory metrics.

How is KEDA different from the HorizontalPodAutoscaler?

The standard HPA scales based on CPU and memory metrics from within the cluster. KEDA extends this by connecting to external event sources like Kafka, SQS, RabbitMQ, Redis, and many others, then scaling based on those metrics. KEDA also enables scaling to zero replicas.

Can KEDA scale to zero?

Yes. KEDA can scale a Deployment from zero to one replica when events arrive, and back to zero when the event source is empty. This is useful for workloads that are idle most of the time, such as batch jobs, queue consumers, and development environments.

What scalers does KEDA support?

KEDA supports over 65 scalers, including Apache Kafka, RabbitMQ, AWS SQS, AWS Kinesis, Azure Service Bus, Azure Storage Queue, Azure Event Hubs, GCP Pub/Sub, Redis, NATS, Prometheus metrics, MySQL, PostgreSQL, MongoDB, Elasticsearch, and Cron.

Does KEDA replace the HPA?

No. KEDA works alongside the HPA. For event-driven scaling, KEDA creates and manages an HPA under the hood, backed by the external metrics it collects. You interact with KEDA through ScaledObject resources and KEDA manages the HPA for you.

What is the difference between ScaledObject and ScaledJob?

A ScaledObject scales a Deployment, StatefulSet, or similar workload by adjusting its replica count. A ScaledJob creates Kubernetes Job objects in response to events. Use ScaledObject for long-running consumers; use ScaledJob for discrete processing tasks where each unit of work is independent.

Is KEDA production-ready?

Yes. KEDA is a CNCF graduated project, the highest maturity level in the CNCF ecosystem. It is widely used in production and is supported as a first-class feature on Azure Kubernetes Service.

How do I debug KEDA when scaling is not working?

Start with kubectl describe scaledobject my-scaler to see the current state and error messages. Check the KEDA operator logs with kubectl logs -n keda deployment/keda-operator. Verify your TriggerAuthentication is in the same namespace as the ScaledObject, and make sure KEDA can reach your event source from inside the cluster.

What to Build Next

Once you have KEDA running with one scaler, the logical next steps are:

Multi-cluster scaling. Run KEDA on every cluster and centralize your event sources so workloads on different clusters compete for the same queue capacity.

Cost-aware scaling. Combine KEDA with OpenCost metrics to build scaling policies that account for cost, for example preferring to scale spot node pools before on-demand ones.

GitOps deployment. Manage ScaledObjects and TriggerAuthentications in Git alongside your Deployment manifests using FluxCD or ArgoCD, so your scaling configuration gets the same review and rollout process as your application code.

Observability. Scrape KEDA's Prometheus metrics and build dashboards that show scaling decisions over time, helping you tune pollingInterval, cooldownPeriod, and threshold values based on real traffic patterns.

KEDA is one of those tools that changes how you think about Kubernetes workloads once you start using it. The shift from "how much CPU does this need?" to "how many events are waiting?" is a more natural model for a lot of real-world workloads, and the scale-to-zero capability alone can meaningfully reduce your cluster costs.

KEDA: A Practical Guide to Event-Driven Autoscaling in Kubernetes