Kafka & Kubernetes: Scaling Consumers
Kafka and Kubernetes (K8s) are a great match. Kafka has knobs to optimize throughput and Kubernetes scales to multiply that throughput.
On the consumer side, there are a few ways to improve scalability.
- Resource & Client Tuning
- Horizontal Pod Autoscaling (HPA)
- Horizontal Workload Scaling
Let’s jump right in.
Resource & Client Tuning
Kafka consumers usually have a very specific job to perform on each Kafka record. As a result, resource allocation is not typically the bottleneck. If anything, we want to allocate as little as possible so that the HPA (see next section) can be as effective as possible. With monitoring in place, observe and tune your service to be as powerful and efficient as it can be so that CPU & Memory are fully utilized.
If your application metrics are not already exposed, this is your first step to scaling. You cannot tune your application if you can’t observe it.
Once resource requirements are understood, squeeze out additional throughput by optimizing the consumer client configurations to meet your goals.
- Increased throughput: Increase the amount of data in batches with fetch.min.bytes.
- Decreased latency: Limit batch sizes with fetch.max.bytes so that batches are handled quicker and more frequently.
There are great recipes out there for optimizing consumers. Go research the topic and figure out what makes the most sense.
Horizontal Pod Autoscaling (HPA)
Out of the box, K8s scales pods based on pod-level metrics like CPU and Memory. This is great, but not ideal for Kafka Consumers. As mentioned above, resources aren’t typically the issue with a consumer. Even as lag increases, the consumer processes records as quickly as it can which means CPU and Memory stay fairly stable. However, with custom metrics support, applications can scale based on any metric such as Kafka Consumer Lag. This metric is as good as it gets in terms of understanding when you should scale out.
If you’re new to Kafka, it’s worth noting that Kafka’s unit of parallelism is the number of topic partitions, so when consuming a topic with 10 partitions the HPA can only scale up to 10 pods during peak loads.
Partition planning, pod tuning, and an effective HPA will cover most of your scaling needs.
Horizontal Workload Scaling
There are scenarios where simply scaling a single workload to 50 pods might not help.
Here is an example: You’ve built a SaaS-like Kafka consumer that is responsible for a large and changing number of topics. These topics have varying partition counts, record counts, record sizes, and SLA requirements. If these all get wrapped up into the same consumer, the more demanding topics (high record count, large record size, etc.) will claim the majority of processing time. Scaling out wider and wider won’t fix this.
Out-of-the-box HPA scaling lacks granularity.
Helm Subcharts make it easy to deploy multiple flavors of a single workload. A workload, in this case, is the tuned consumer with HPA configured.
In the Horizontal Workload deployment model, a workload can be dedicated to a large topic while another workload focuses on a set of smaller topics. The workloads do not compete with each other and independently scale to meet the needs of the topic(s) they are responsible for. This also allows for data to be isolated to specific consumers which may be beneficial in certain environments.
This per-topic flexibility will allow you to efficiently scale to meet your needs.
Rather than clutter this blog with sample code, I created a sample repo to illustrate this deployment model. The repo is for demo purposes only.
Kafka boasts scalability. It’s been a cornerstone of the product since Day 1. However, it’s not always clear how we can capitalize on this.
There are a few layers to take into consideration when building Kafka Consumers. Use none, 1, or all of them.
- Resource & Client Tuning – Optimize an application.
- Horizontal Pod Autoscaling – Autoscale the optimized application.
- Horizontal Workload Scaling – Scale the autoscaled, optimized application.