Azure Kubernetes Chronicles, part 5: Autoscaling with KEDA

Autoscaling is a transformative approach for modern cloud-native application architectures, particularly within Kubernetes and microservices environments. As the adoption of these technologies accelerates, implementing automated workload adjustments based on real-time demand becomes imperative. This capability enhances user experience and ensures continuous high availability and cost efficiency. Consider the diverse applications: whether deploying a customer-facing web application, processing extensive data sets, or orchestrating backend task queues, the proficiency in automating workload scaling is essential for operational success and resource optimization.

Kubernetes provides robust auto-scaling capabilities through tools such as the Horizontal Pod Autoscaler (HPA), Vertical Pod Autoscaler (VPA), and Cluster Autoscaler. These tools function similarly to a personal assistant for managing workloads, dynamically adjusting resource allocation based on crucial metrics like CPU utilization, memory consumption, and other performance indicators.

However, in modern applications — particularly those that operate asynchronously, are event-driven, or utilize messaging protocols — traditional resource metrics may not sufficiently capture the nuances of workload requirements.

This article will delve into strategies to effectively leverage Kubernetes’ auto-scaling features to optimize application performance and resource efficiency in contemporary environments.

· Why Autoscaling Matters
· Use Case Scenarios Where Autoscaling is Critical
· Native Kubernetes Autoscaling Techniques
· The Limitations of Native Autoscaling
· Introducing KEDA: Kubernetes Event-driven Autoscaling
· What is KEDA?
· KEDA Architecture
· Benefits of Using KEDA on Azure Kubernetes Service (AKS)
· Deploying KEDA on AKS
· Deploying an Event-Driven Application with KEDA
· Monitoring and Observability with KEDA
· Troubleshooting and Debugging KEDA
· Conclusion: Scaling Smarter with KEDA on AKS

Why Autoscaling Matters

In our fast-paced digital world, where smooth and instant experiences have become part of everyday life, autoscaling is your secret weapon for keeping applications running smoothly while being kind to your budget. Whether preparing for the exciting rush of Black Friday or enjoying some peace during quieter times, your applications need to adapt in real-time to meet these ever-changing demands. We have to ensure your tech is always ready for whatever comes next!

Autoscaling helps achieve three essential goals in modern cloud-native environments:

Performance and Availability

Users expect fast response times, minimal latency, and consistent uptime. Applications that cannot scale to meet user demand risk degraded performance, application crashes, or even outages. Autoscaling ensures that the necessary resources are available when needed, enhancing the user experience and ensuring service level objectives (SLOs) are met.

Resource Efficiency and Cost Optimization

Cloud resources are not free. Autoscaling helps strike a balance between resource allocation and cost by dynamically adjusting compute power to match actual usage. This means you avoid overprovisioning during low-traffic periods and underprovisioning when demand spikes.

This can result in substantial cost savings for organizations running large-scale distributed systems or multiple microservices, especially when leveraging features like KEDA’s scale-to-zero capabilities, which completely shut down idle workloads.

Operational Simplicity and Automation

Manual scaling is inefficient, error-prone, and doesn’t scale with the complexity of modern applications. Autoscaling enables you to automate resource provisioning, reduce operational overhead, and free up engineering time to focus on delivering business value rather than managing infrastructure.

Furthermore, autoscaling aligns perfectly with GitOps and Infrastructure as Code (IaC) principles, making it easier to codify, version, and track infrastructure changes across environments.

Use Case Scenarios Where Autoscaling is Critical

  • E-commerce platforms scaling during flash sales and high-traffic campaigns.
  • SaaS applications responding to dynamic user interactions.
  • Data processing systems handling batch or stream-based workloads.
  • IoT platforms ingest unpredictable volumes of telemetry data.
  • Event-driven microservices processing jobs from queues, topics, or HTTP triggers.

Native Kubernetes Autoscaling Techniques

Kubernetes offers built-in autoscaling features integral to managing resources in a dynamic environment. These mechanisms include the Horizontal Pod Autoscaler (HPA), Vertical Pod Autoscaler (VPA), and Cluster Autoscaler. Each serves a distinct purpose and targets different scaling dimensions.

Horizontal Pod Autoscaler (HPA)

HPA automatically adjusts the number of pods in a deployment, replica set, or stateful set based on observed CPU utilization (or other select metrics like memory or custom metrics).

  • How it works: HPA uses metrics collected from the Metrics Server (or Prometheus adapter) to make scaling decisions. For example, if the average CPU usage exceeds a defined threshold, HPA increases the number of pods.
  • Example manifest:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: webapp-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: webapp
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
  • Limitations: HPA is resource-centric. It doesn’t handle event-driven workloads well (e.g., scaling based on queue length or incoming events).

Vertical Pod Autoscaler (VPA)

VPA automatically adjusts CPU and memory requests/limits for containers within pods to optimize resource usage. It’s ideal for workloads that cannot be scaled horizontally or where resource usage patterns vary significantly.

Modes of Operation:

  • Auto: VPA can update pod resources and restart them automatically.
  • Initial: Sets recommendations only when a pod is first created.
  • Off: VPA monitors usage and provides recommendations, but doesn’t enforce changes.
  • Example VPA Manifest:
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: backend-vpa
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: backend
updatePolicy:
updateMode: Auto

Limitations: VPA restarts pods when applying new resource values, which may not be suitable for high-availability workloads unless carefully orchestrated.

Cluster Autoscaler

Cluster Autoscaler operates at the infrastructure level and adjusts the number of nodes in a cluster. It scales out when pods are unschedulable due to resource constraints and scales in when nodes are underutilized.

Key Features

  • Works with AKS, GKE, and EKS.
  • Scales only node pools with autoscaling enabled.

Example (AKS CLI):

az aks nodepool update \
--resource-group myResourceGroup \
--cluster-name myAKSCluster \
--name nodepool1 \
--enable-cluster-autoscaler \
--min-count 1 \
--max-count 5
  • Limitations: Cluster autoscaling can take several minutes and may not react quickly to sudden workload spikes.

Depending on your workload characteristics, these autoscaling mechanisms can be used independently or in combination. However, none inherently support autoscaling based on external event sources like queues or message buses, where KEDA adds its unique value.

The Limitations of Native Autoscaling

While Kubernetes’ native autoscaling mechanisms are powerful and essential for many workloads, they have inherent limitations that reduce their effectiveness in modern, event-driven, and microservice-heavy environments. Let’s explore some of the most significant gaps and constraints.

Resource-Centric Metrics Only

HPA and VPA rely heavily on CPU and memory utilization as primary indicators for scaling. While this is useful for compute-bound applications, it doesn’t capture real application pressure in event-driven systems.

Example: A message queue-backed service may experience a massive influx of messages. However, if the current pods are underutilized (CPU-wise), HPA won’t scale the workload, even though the backlog is growing.

This creates a bottleneck where messages accumulate and latency increases, leading to delayed processing and a poor user experience.

Lack of Event Awareness

None of the native autoscalers in Kubernetes can natively respond to external signals like queue length, HTTP request volume, database entries, or cloud events.

This severely limits their use in event-driven architectures where these external signals represent the application load. Without this context, scaling decisions are essentially blind to the system’s actual needs.

Slow Reaction Times

Cluster Autoscaler can take several minutes to provision new nodes. While it’s excellent for optimizing node pool sizes and preventing resource waste, it’s too slow for latency-sensitive applications that require rapid response to surges in demand.

Similarly, HPA’s default polling interval (30 seconds) and gradual scale-up strategy may not be sufficient for workloads that spike quickly and require immediate action.

No Support for Scale-to-Zero

A critical feature for cost efficiency in asynchronous workloads is the ability to scale to zero when there is no work to do. Native autoscaling does not support this concept — HPA requires a minimum of one replica, and Cluster Autoscaler will not remove the last node if it would render the cluster unschedulable.

This represents a significant inefficiency for workloads that remain idle most of the time but must respond quickly when triggered.

Complex Configuration for Custom Metrics

While HPA supports custom metrics through adapters such as the Prometheus Adapter, setting this up can be complex and prone to errors. It also necessitates maintaining a separate monitoring and metrics collection infrastructure for scaling decisions.

This overhead often becomes a barrier for teams that need quick, flexible autoscaling capabilities.

Workload Compatibility Gaps

Some workloads do not scale well using traditional resource-based indicators:

  • Jobs and batch processes
  • Queue consumers
  • Stateful services

These workloads require scaling decisions based on external conditions (like queue depth or job count) rather than internal pod metrics. Native Kubernetes autoscalers are not optimized for such patterns.

Inconsistent Behaviour Across Cloud Providers

While Kubernetes is cloud-agnostic, autoscaling behaviour can vary depending on how a managed Kubernetes service (like AKS, EKS, GKE) implements metrics collection, cluster scaling policies, and integration with cloud-native services.

This inconsistency complicates hybrid or multi-cloud strategies where uniform autoscaling behavior is required.

These limitations underscore the need for a more extensible and event-aware autoscaling system that integrates with external services and supports advanced scaling patterns. This is precisely where KEDA (Kubernetes Event-driven Autoscaling) steps in.

Introducing KEDA: Kubernetes Event-driven Autoscaling

Kubernetes Event-driven Autoscaling (KEDA) is a lightweight, open-source component that brings event-based autoscaling to Kubernetes. It allows applications to scale dynamically based on the number of events needing to be processed, whether those events are messages in a queue, rows in a database, or custom metrics from a monitoring system.

Originally developed by Microsoft and Red Hat, KEDA has since evolved into a robust and widely adopted project under the Cloud Native Computing Foundation (CNCF).

KEDA effectively bridges the gap between external event sources and Kubernetes’ native autoscaling framework (HPA). It does this by exposing custom metrics to Kubernetes and, when necessary, automatically launching or shutting down workloads, including scale-to-zero scenarios, which native Kubernetes cannot do independently.

What is KEDA?

KEDA is a Kubernetes-based event-driven autoscaler that enables fine-grained, real-time autoscaling for container workloads. Unlike the default HPA, which scales pods based on resource metrics, KEDA enables workloads to scale based on external data sources like message queues, event streams, HTTP request rates, etc.

KEDA does this in two primary ways:

  1. Metrics Adapter: KEDA is a metrics provider that feeds custom metrics into Kubernetes’ HPA via the Metrics API. HPA uses these metrics to make scaling decisions.
  2. Activation Controller: KEDA can activate and deactivate Kubernetes Deployments, including scaling them to zero and back again when event thresholds are crossed.

KEDA Architecture

At a high level, KEDA consists of the following components:

  • Operator: Watches for custom KEDA resources (ScaledObjects, ScaledJobs) and manages scaling behavior.
  • Metrics Adapter: Exposes custom metrics to the Kubernetes Metrics API for use by HPA.
  • Trigger Scalers: Interfaces that define how to pull metric data from external sources. These include built-in scalers for Azure Service Bus, RabbitMQ, Kafka, Prometheus, AWS SQS, and more.
Architecture diagram

Core Concepts

ScaledObject: A Kubernetes custom resource that defines how to scale a particular Deployment based on an event source. Each ScaledObject includes:

  • The target deployment to scale.
  • The scaling trigger type (e.g., Azure Service Bus, Kafka).
  • Trigger metadata such as connection strings, queue names, and thresholds.

ScaledJob: A special resource designed for one-time jobs or batch processing. KEDA launches short-lived pods that perform work based on events.

Triggers: The core mechanism KEDA uses to evaluate when and how to scale. They are defined by source type and contain metadata such as polling interval, threshold, and authentication information

Example ScaledObject YAML:

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: azure-queue-scaler
spec:
scaleTargetRef:
name: queue-processor
minReplicaCount: 0
maxReplicaCount: 10
triggers:
- type: azure-queue
metadata:
queueName: myqueue
connectionFromEnv: AzureWebJobsStorage
queueLength: "5"

Supported Event Sources

KEDA supports over 50 scalers out-of-the-box. Popular scalers include:

  • Azure Service Bus
  • Azure Storage Queues
  • Azure Event Hubs
  • Kafka
  • RabbitMQ
  • AWS SQS and CloudWatch
  • PostgreSQL and MySQL
  • Prometheus queries
  • Redis Streams
  • Cron (time-based triggers)

For a complete list, refer to: https://keda.sh/docs/

Benefits of Using KEDA on Azure Kubernetes Service (AKS)

KEDA is such a fantastic extension for Kubernetes, and when combined with Azure Kubernetes Service (AKS), it truly shines! Microsoft Azure offers a fantastic managed experience for KEDA, simplifying the processes of integration, operation, and scaling of production workloads. Below are the key benefits of using KEDA specifically on AKS:

First-Party AKS Integration and Managed Add-on Support

KEDA is available as a native add-on for AKS, which dramatically simplifies the installation and lifecycle management process. You can enable KEDA directly via the Azure CLI, ARM templates, or Bicep.

az aks enable-addons \
- addons keda \
- name myAKSCluster \
- resource-group myResourceGroup

Azure manages the underlying KEDA components, including the operator and metrics server, providing better stability, supportability, and seamless integration with Azure RBAC and identity management.

Azure-Native Event Source Support

KEDA offers deep integration with Azure services out of the box, including:

  • Azure Storage Queues
  • Azure Service Bus (Queues and Topics)
  • Azure Event Hubs
  • Azure Monitor (via custom metrics)
  • Azure Blob Trigger (via Event Grid)

This allows AKS workloads to respond natively to Azure ecosystem signals, without requiring additional wrappers, shims, or bridge services.

Seamless Identity and Secret Management

KEDA on AKS can leverage:

  • Azure AD Pod Identity (for secure access to services without connection strings)
  • Azure Key Vault Provider for Secrets Store CSI Driver (for securely injecting credentials)
  • Managed Identities (for tightly scoped and rotated permissions)

These integrations help reduce the risk of credential leaks, simplify compliance with enterprise-grade security policies, and support the implementation of zero trust.

Scale-to-Zero for Cost Efficiency

With KEDA, AKS workloads can scale down to zero replicas when idle. This is especially powerful for workloads that only run occasionally or during business hours.

Example use cases:

  • Line-of-business apps are active only during working hours.
  • Batch processing jobs are triggered via queue messages.
  • Seasonal workloads with unpredictable traffic patterns.

When combined with Azure Spot VMs and cluster autoscaler, KEDA enables highly cost-efficient architectures.

Advanced Telemetry and Monitoring Support

KEDA automatically integrates with Azure Monitor by running on AKS and can be further enhanced with Prometheus and Grafana setups. You can visualize metrics such as:

  • Number of active messages in a queue
  • Replica count trends
  • Time spent at peak vs idle

These insights help fine-tune your scaling policies and provide visibility into autoscaling performance.

Simplified Developer and Ops Experience

With Azure-native tooling (Azure CLI, Bicep, Azure Monitor, AKS diagnostics), engineers can easily provision and manage KEDA. It fits well into both GitOps and IaC strategies, enabling teams to:

  • Automate deployments
  • Version and review scaling configs
  • Audit autoscaling activity

Microsoft Support and Enterprise Compliance

KEDA is officially supported by Microsoft as part of the AKS ecosystem, which means you can escalate issues, file support tickets, and receive help under your enterprise support agreements.

Furthermore, AKS clusters running KEDA can be integrated with:

  • Azure Policy for compliance enforcement
  • Azure Defender for threat protection
  • Azure Arc for hybrid observability

These benefits make KEDA on AKS a compelling choice for modern applications that need fast, flexible, and secure autoscaling capabilities across various event sources.

Deploying KEDA on AKS

There are two primary ways to deploy KEDA on Azure Kubernetes Service (AKS):

  • Using the built-in AKS add-on is the recommended and most straightforward approach.
  • Using Helm for custom scenarios where greater configuration control is needed.

Both options offer a seamless integration into your AKS cluster. Let’s explore each approach.

Prerequisites

Before you begin, ensure the following:

  • You have an existing AKS cluster running Kubernetes 1.20 or higher.
  • You have the Azure CLI and kubectl installed and configured.
  • You have Contributor or higher access to the AKS cluster’s resource group.

Verify Azure CLI version:

az version
Check kubectl context:
kubectl config get-contexts

Option 1: Enabling the KEDA Add-on (Recommended)

The simplest and most supported way to deploy KEDA on AKS is using the built-in AKS add-on:

az aks enable-addons \
- addons keda \
- name \
- resource-group

Note: Replace and with the actual values.

This will:

  • Deploy the KEDA operator and metrics server to the kube-system namespace.
  • Configure appropriate permissions via Azure RBAC.
  • Enable automatic updates and patching via Azure.

You can verify the KEDA components are running using:

kubectl get pods -n kube-system | grep keda

You should see pods like keda-operator and keda-metrics-apiserver in a Running state.

Option 2: Installing KEDA Manually with Helm

Helm allows you to customize your KEDA installation (e.g., custom namespaces, scaling intervals, Prometheus integration) if you need more flexibility.

Step 1: Add the KEDA Helm repository

helm repo add kedacore https://kedacore.github.io/charts
helm repo update

Step 2: Create a dedicated namespace (optional but recommended)

kubectl create namespace keda

Step 3: Install KEDA with Helm

helm install keda kedacore/keda \
- namespace keda \
- set prometheus.metricServer.enabled=true

Step 4: Confirm installation

kubectl get all -n keda

This should list the KEDA operator deployment, service, and metrics server.

Post-Deployment Checks

Regardless of your installation method:

Ensure your Metrics Server is functioning:

kubectl top pods

Confirm the CRDs (Custom Resource Definitions) for ScaledObject and ScaledJob are installed:

kubectl get crds | grep keda

Expected output:


scaledobjects.keda.sh
scaledjobs.keda.sh
triggerauthentications.keda.sh

With KEDA deployed and operational, your AKS cluster can now scale workloads dynamically based on external event sources.

Deploying an Event-Driven Application with KEDA

Now that KEDA is installed and operational, let’s walk through a hands-on example. In this section, we’ll deploy an event-driven application that uses Azure Storage Queues and scales dynamically using KEDA’s ScaledObject. mechanism.

This example demonstrates how to:

Create an Azure Storage Queue
Deploy a queue processor in AKS.
Configure a KEDA ScaledObject to autoscale based on queue length.
Step 1: Set Up Azure Resources

Ensure the Azure CLI is logged in and targeting the correct subscription and region:

az login
az account set --subscription "YourSubscriptionName"

Create a resource group:

az group create --name demo-keda-rg --location westeurope

Create a storage account:

az storage account create \
--name demokedastorageacct \
--resource-group demo-keda-rg \
--location westeurope \
--sku Standard_LRS

Retrieve the storage account connection string:

az storage account show-connection-string \
--name demokedastorageacct \
--resource-group demo-keda-rg \
--query connectionString --output tsv

Create a queue:

az storage queue create \
--name ordersqueue \
--account-name demokedastorageacct

Step 2: Create Kubernetes Secrets for the Queue Connection

Store the Azure Storage connection string securely in AKS:

kubectl create secret generic azure-queue-secret \
--from-literal=AzureWebJobsStorage=""

Replace with the value from the previous step.

Step 3: Deploy the Queue Processor Application

Here is a sample Kubernetes deployment manifest:

apiVersion: apps/v1
kind: Deployment
metadata:
name: queue-processor
spec:
replicas: 1
selector:
matchLabels:
app: queue-processor
template:
metadata:
labels:
app: queue-processor
spec:
containers:
- name: queue-processor
image: myregistry.azurecr.io/queue-processor:latest
env:
- name: AzureWebJobsStorage
valueFrom:
secretKeyRef:
name: azure-queue-secret
key: AzureWebJobsStorage

Apply the deployment:

kubectl apply -f queue-processor-deployment.yaml

Ensure the pod is running:

kubectl get pods -l app=queue-processor

Step 4: Define a KEDA ScaledObject

Here’s a basic ScaledObject manifest that uses the Azure Storage Queue scaler:

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: queue-scaler
spec:
scaleTargetRef:
name: queue-processor
minReplicaCount: 0
maxReplicaCount: 10
triggers:
- type: azure-queue
metadata:
queueName: ordersqueue
connectionFromEnv: AzureWebJobsStorage
queueLength: "5"

Apply the ScaledObject:

kubectl apply -f queue-scaler.yaml

Verify the ScaledObject:

kubectl get scaledobject

Step 5: Generate Load

You can enqueue messages to the storage queue manually to trigger scaling:

for i in {1..20}; do
az storage message put \
--account-name demokedastorageacct \
--queue-name ordersqueue \
--content "Message $i"
echo "Message $i enqueued"
done

Check the number of replicas:

kubectl get deployment queue-processor

Within moments, KEDA will detect the backlog and scale the queue-processor deployment accordingly.

Monitoring and Observability with KEDA

KEDA integrates with Kubernetes’ metrics pipeline, making monitoring autoscaling activity straightforward through standard observability tools like the Kubernetes Metrics Server, Prometheus, and Grafana.

Metrics Server Integration

The Kubernetes Metrics Server collects resource usage metrics from pods and nodes. While KEDA provides custom metrics, the Metrics Server is still essential for HPA to make decisions.

To verify the Metrics Server is running:

kubectl get deployment metrics-server -n kube-system

Check live metrics:

kubectl top pods

If this returns metrics, the Metrics Server is operational. If not, you may need to install or troubleshoot it.

Enabling KEDA Metrics for Prometheus

If you’ve installed KEDA via Helm with Prometheus metrics enabled:

helm upgrade --install keda kedacore/keda \
--namespace keda \
--set prometheus.metricServer.enabled=true

This exposes a /metrics endpoint on the keda-metrics-apiserver that Prometheus can scrape.

To expose metrics, ensure a ServiceMonitor is defined (if you are using the Prometheus Operator):

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: keda-servicemonitor
labels:
release: prometheus
spec:
selector:
matchLabels:
app: keda-operator
namespaceSelector:
matchNames:
- keda
endpoints:
- port: http
interval: 30s

Ensure the KEDA metrics API server is reachable:

kubectl get svc -n keda

Viewing Metrics in Grafana

Once Prometheus is scraping metrics from KEDA, you can build Grafana dashboards using metrics like:

keda_scaledobject_scaler_queueLength
keda_scaledobject_scaled_replicas
keda_scaledobject_active

You can import a prebuilt dashboard or create your own using Prometheus queries to correlate queue depth with pod replicas over time.

Azure Monitor Integration

If using the AKS-managed KEDA add-on, Azure Monitor will automatically collect some metrics. To visualize them:

  • Go to Azure Portal > AKS Cluster > Insights > Workloads.
  • Use Log Analytics queries for deeper inspection:
KubePodInventory
| where ContainerName == "keda-operator"

This allows you to trace KEDA activity and correlate it with application logs.

Monitoring is critical for understanding the behavior of your autoscalers and optimizing for performance and cost.

Troubleshooting and Debugging KEDA

Troubleshooting KEDA involves inspecting logs, validating ScaledObject definitions, and checking event source connectivity. Because KEDA spans multiple components — operator, metrics server, trigger scalers — debugging often involves Kubernetes-level diagnostics and external system checks.

Check the KEDA Operator Logs

The operator is the control plane component that manages autoscaling. Inspect logs to see if scaling decisions are being made:

kubectl logs -l app=keda-operator -n keda

Look for lines like:

  • Successfully updated deployment … from 1 to 3 replicas
  • Verify ScaledObject Configuration

Ensure the ScaledObject is applied correctly and recognized:

kubectl get scaledobjects.keda.sh
kubectl describe scaledobject

Validate the scaling trigger and metadata — incorrect queueName, missing credentials, or misconfigured pollingIntervalare common issues.

Check Metrics Server Availability

The metrics server must be available and returning data. Run:

kubectl top pods

If you receive an error, ensure the metrics server is installed and running:

kubectl get deployment metrics-server -n kube-system

Confirm Trigger Activity

For event-driven triggers (like Azure Queues), validate that messages exist in the queue and the queue name matches the configuration.

You can use KEDA logs to trace trigger execution:

kubectl logs -l app=keda-operator -n keda | grep Trigger

Ensure Permissions Are Set Correctly

Missing RBAC roles or incorrect Azure identity configuration can prevent KEDA from authenticating with event sources. Use:

kubectl describe serviceaccount keda-operator -n keda
kubectl get clusterrolebinding | grep keda

Monitor Metrics API Server

If Prometheus scraping is enabled and the KEDA Metrics API server is exposed, ensure it’s operational:

kubectl get pods -n keda | grep keda-metrics-apiserver
kubectl port-forward svc/keda-metrics-apiserver 9022:9022 -n keda
curl http://localhost:9022/metrics

This endpoint should return a list of KEDA metrics.

Thorough diagnostics and logs are critical when working with KEDA.

Conclusion: Scaling Smarter with KEDA on AKS

Autoscaling has become a must-have rather than just a nice feature, especially for creating resilient, responsive, and cost-effective cloud-native applications. Although Kubernetes offers robust built-in autoscalers like HPA, VPA, and the Cluster Autoscaler, they sometimes don’t quite meet the demands of today’s event-driven workloads, where external signals play a crucial role in determining scaling needs. That’s where KEDA steps in as a game-changer. By enabling autoscaling based on real-world events like queue length, HTTP requests, or database state, KEDA transforms your AKS workloads into intelligent, reactive systems that can scale dynamically — and even scale to zero when idle.

Thanks to its native integration with Azure Kubernetes Service, robust support for Azure event sources, and seamless compatibility with popular enterprise tools like Prometheus, Azure Monitor, and GitOps workflows, KEDA makes event-driven scaling powerful and effortlessly manageable.

Whether you’re navigating fluctuating traffic, managing asynchronous tasks, or developing agile microservices, KEDA equips your team to scale workloads effectively, ensuring precision, efficiency, and responsiveness.
Now it’s your turn: dive into hands-on deployment, observe the system’s performance, and start crafting customized autoscaling strategies designed specifically for your workload’s requirements. Happy scaling!

Want to know more about what we do?

We are your dedicated partner. Reach out to us.