Kafka & Kubernetes: Scaling Consumers

Kafka and Kubernetes (K8s) are a great match. Kafka has knobs to optimize throughput and Kubernetes scales to multiply that throughput.

On the consumer side, there are a few ways to improve scalability.

  1. Resource & Client Tuning
  2. Horizontal Pod Autoscaling (HPA)
  3. Horizontal Workload Scaling

Let’s jump right in.

Resource & Client Tuning

Kafka consumers usually have a very specific job to perform on each Kafka record. As a result, resource allocation is not typically the bottleneck. If anything, we want to allocate as little as possible so that the HPA (see next section) can be as effective as possible. With monitoring in place, observe and tune your service to be as powerful and efficient as it can be so that CPU & Memory are fully utilized.

If your application metrics are not already exposed, this is your first step to scaling. You cannot tune your application if you can’t observe it.

Once resource requirements are understood, squeeze out additional throughput by optimizing the consumer client configurations to meet your goals.

For example

  • Increased throughput: Increase the amount of data in batches with fetch.min.bytes.
  • Decreased latency: Limit batch sizes with fetch.max.bytes so that batches are handled quicker and more frequently.

There are great recipes out there for optimizing consumers. Go research the topic and figure out what makes the most sense.

Horizontal Pod Autoscaling (HPA)

Out of the box, K8s scales pods based on pod-level metrics like CPU and Memory. This is great, but not ideal for Kafka Consumers. As mentioned above, resources aren’t typically the issue with a consumer. Even as lag increases, the consumer processes records as quickly as it can which means CPU and Memory stay fairly stable. However, with custom metrics support, applications can scale based on any metric such as Kafka Consumer Lag. This metric is as good as it gets in terms of understanding when you should scale out.

If you’re new to Kafka, it’s worth noting that Kafka’s unit of parallelism is the number of topic partitions, so when consuming a topic with 10 partitions the HPA can only scale up to 10 pods during peak loads.

Partition planning, pod tuning, and an effective HPA will cover most of your scaling needs.

Most…

Horizontal Workload Scaling

There are scenarios where simply scaling a single workload to 50 pods might not help.

Here is an example: You’ve built a SaaS-like Kafka consumer that is responsible for a large and changing number of topics. These topics have varying partition counts, record counts, record sizes, and SLA requirements. If these all get wrapped up into the same consumer, the more demanding topics (high record count, large record size, etc.) will claim the majority of processing time. Scaling out wider and wider won’t fix this.

Out-of-the-box HPA scaling lacks granularity.

Helm Subcharts make it easy to deploy multiple flavors of a single workload. A workload, in this case, is the tuned consumer with HPA configured.

In the Horizontal Workload deployment model, a workload can be dedicated to a large topic while another workload focuses on a set of smaller topics. The workloads do not compete with each other and independently scale to meet the needs of the topic(s) they are responsible for. This also allows for data to be isolated to specific consumers which may be beneficial in certain environments.

This per-topic flexibility will allow you to efficiently scale to meet your needs.

Sample Project

Rather than clutter this blog with sample code, I created a sample repo to illustrate this deployment model. The repo is for demo purposes only.

https://github.com/schroedermatt/helm-subchart-example/tree/master/record-processor

Summary

Kafka boasts scalability. It’s been a cornerstone of the product since Day 1. However, it’s not always clear how we can capitalize on this.

There are a few layers to take into consideration when building Kafka Consumers. Use none, 1, or all of them.

  1. Resource & Client Tuning – Optimize an application.
  2. Horizontal Pod Autoscaling – Autoscale the optimized application.
  3. Horizontal Workload Scaling – Scale the autoscaled, optimized application.

About the Author

Matt Schroeder profile.

Matt Schroeder

Director, Modern API

A wide range of professional experience and a Master’s Degree in Software Engineering have become the foundation that enables Matt to lead teams to the best solution for every problem.

Leave a Reply

Your email address will not be published. Required fields are marked *

Related Blog Posts
Infrastructure as Code – The Wrong Way
You are probably familiar with the term “infrastructure as code”. It’s a great concept, and it’s gaining steam in the industry. Unfortunately, just as we had a lot to learn about how to write clean […]
Snowflake CI/CD using Jenkins and Schemachange
CI/CD and Management of Data Warehouses can be a serious challenge. In this blog you will learn how to setup CI/CD for Snowflake using Schemachange, Github, and Jenkins. For access to the code check out […]
How to get your pull requests approved more quickly
TL;DR The fewer reviews necessary, the quicker your PR gets approved. Code reviews serve an essential function on any software codebase. Done right, they help ensure correctness, reliability, and maintainability of code. On many teams, […]
AWS RDS MYSQL Playground
Do you need a reliable database platform to hammer out some new application ideas? Or, maybe you’d like to learn MYSQL in a disposable environment? This Hashicorp Terraform MYSQL RDS Module will build all the […]