Unleashing Feature Flags onto Kafka Consumers

Feature flags are a tool to strategically enable or disable functionality at runtime. They are often used to drive different user experiences but can also be useful in real-time data systems. In this post, we’ll walk through using feature flags to pause and resume a Kafka Consumer on the fly.

You may be wondering, why not just shut down the consumer(s) to “pause” processing? This approach is 100% effective and accomplishes the end goal of stopping consumption. However, there are downsides when dealing with a large-scale consumer architecture.

  • Rebalances – depending on the layout of your consumer group and the number of partitions, the rebalance process that triggers when a consumer leaves the group can be painful and time-consuming.
  • Lack of Granularity – if the consumer is reading multiple topics, you shut down consumption of everything. There is no in-between.
  • Responsive Processing – to re-enable processing, all of the consumers have to start back up and rebalance partitions. With 100+ consumer instances, this can be time-consuming.

Before getting into how to incorporate feature flags we should look at the functionality that already exists in the Kafka client libraries to dynamically pause and resume processing. We’re already closer to the end goal than you may have realized.

Pausing Kafka Consumers

The Kafka Consumer client has the functionality to fetch assigned partitions, pause specific partitions, fetch currently paused partitions, and resume specific partitions. These methods are all you need to pause and resume consumption dynamically.

Let’s look closer at these operations.

  • KafkaConsumer.assignment(): returns the currently assigned topic partitions in the form of a Set of TopicPartition. The TopicPartition contains two primary properties, the topic name as a String and the partition as an int.
  • KafkaConsumer.pause(Set of TopicPartition): suspends fetching from the provided topic partitions. Consumer polling will continue, which is key to keeping the consumer alive and avoiding rebalances. However, future polls will not return any records from paused partitions until they have been resumed.
  • KafkaConsumer.paused(): returns the topic partitions that are currently assigned and in a paused state as a result of using the pause method.
  • KafkaConsumer.resume(Set of TopicPartition) : resumes fetching from the provided topic partitions.

Option 1: Pause the World

The simplest way to pause processing is to pause everything. The snippet below illustrates this.

Option 2: Selective Pause

It’s helpful to have the big red button to shut down all processing but it would be more helpful if we could control things at a more granular level too.

The next snippet shows what pausing the partitions for a single topic might look like.

Unleash the Flags

With the previous snippet in hand, we’re almost there. We need to wire up a feature flag solution to provide the “pausedTopic” value(s) at runtime. Using a naming convention such as topic_{topic-name-here} , the application can pull all of the flags and filter out only those that it cares about.

The pseudo-code below shows what this might look like.

You can integrate any feature flag solution, but let’s wire up Unleash.

Unleash

Unleash is an open-source feature management platform that has worked well for me. Gitlab even offers it as a built-in service available in all tiers as of Gitlab 13.5. A full list of capabilities can be found here.

Below is an example of configuring the Unleash client in a Java Spring Boot application. If you’re using Gitlab, there are clear instructions on integrating feature flags with your application.

Fetching Feature Flags

We’ve looked at the pause/resume functionality and how a feature flag naming convention can be used to target the topics that should be paused/resumed. Let’s tie the two together with two strategies for fetching and applying the toggles.

Option 1: Poll Loop

The simplest option is to integrate the feature flag checks right into the poll loop. Unleash will be refreshing the flag states in a background thread so each check will be hitting cache when looking at the latest values and seeing what, if anything, needs to be paused. The benefit of this approach is that everything happens right in the polling thread which is important since the consumer is not thread-safe.

Option 2: UnleashEvent Subscriber

Unleash has an internal eventing design that makes it easy to subscribe to the events that get triggered after refreshing the toggles. This is the most up-to-date representation of the flag states because the event is fired immediately after the cache is updated.

As mentioned in Option 1, Consumers are not thread-safe, so you have to handle consumer operations appropriately and run the consumer pause operations on the polling thread.

The main benefit of this is that it doesn’t clutter the poll loop with responsibilities.

Don’t Forget about Rebalances

The final piece that needs to be accounted for is the ConsumerRebalanceListener. With multiple consumers running in a group, each is responsible for pausing their assigned partitions. If a consumer dies, their assigned partitions automatically rebalance to the other consumers in the group. However, the consumer receiving the newly assigned partitions has no awareness of their previous state (paused/active) so they will be active after the assignment.

The RebalanceListener is your hook in the rebalance lifecycle to pause partitions before record consumption begins and avoid accidentally consuming a topic that should be paused. Using the components that have been built above, create or update the onPartitionsAssigned method to keep partitions paused if necessary.

About the Author

Matt Schroeder profile.

Matt Schroeder

Director, Real-Time Data

A wide range of professional experience and a Master’s Degree in Software Engineering have become the foundation that enables Matt to lead teams to the best solution for every problem.

Leave a Reply

Your email address will not be published.

Related Blog Posts
Natively Compiled Java on Google App Engine
Google App Engine is a platform-as-a-service product that is marketed as a way to get your applications into the cloud without necessarily knowing all of the infrastructure bits and pieces to do so. Google App […]
Building Better Data Visualization Experiences: Part 2 of 2
If you don't have a Ph.D. in data science, the raw data might be difficult to comprehend. This is where data visualization comes in.
A security model for developers
Software security is more important than ever, but developing secure applications is more confusing than ever. TLS, mTLS, RBAC, SAML, OAUTH, OWASP, GDPR, SASL, RSA, JWT, cookie, attack vector, DDoS, firewall, VPN, security groups, exploit, […]
Building Better Data Visualization Experiences: Part 1 of 2
Through direct experience with data scientists, business analysts, lab technicians, as well as other UX professionals, I have found that we need a better understanding of the people who will be using our data visualization products in order to build them. Creating a product utilizing data with the goal of providing insight is fundamentally different from a typical user-centric web experience, although traditional UX process methods can help.