Kafka Testing using YAML

Integration testing can be difficult for distributed systems. I’ve found this to be true when testing Kafka to ensure data is produced correctly. I’ve tried using a consumer in my tests to verify data made it into the topic. Unfortunately, the Observer Effect made this unstable because of Kafka’s design to balance partitions between consumers.

I found using a Producer Interceptor solved my testing problem in a less invasive way. (Note that for unit tests, Kafka provides MockProducer and MockConsumer classes that are quite powerful and should be used when applicable.)

Kafka Producer Interceptor

Producers can have interceptors that are given the opportunity to process records and optionally modify them. We’ll use an interceptor that logs the record to a file. The file format will be YAML. YAML allows us to append elements to the file and the format is always valid. With JSON or XML we’d need closing syntax and that becomes more difficult to record a valid file.

The interceptor is configured using the properties given to the KafkaProducer constructor:

  • interceptor.classes = LoggingProducerInterceptor
  • interceptor.LoggingProducerInterceptor.file = /tmp/producer.log

Output Format

The YAML file is an array of JSON objects. The first entry is the configuration for the producer. Each entry after that has the record thread name, key and value. The thread can be used, in addition to the key and/or value, for querying in the test.

Writing the Test

Verification is done by reading the YAML file and finding the record you expect. With Groovy Iterable extensions and YAML/JSON, this is straightforward and readable.

There are two common patterns I found in my tests. First, limiting to the current test thread, which assumes the producer interceptor is called in the same thread as the test. Second, ignoring existing records.

I wrote a helper class that provides the YAML as a collection with optional filtering by thread and/or ignoring the existing records.

Here’s an example of a test using LoggingProducerOutput:

The example is simple but your verification doesn’t have to be. The complete value of the record is available for testing.


I had unstable integration tests that would fail about 10% of the time. Using a YAML file to record producer activity removed the instability when checking for correctly produced records. Also, it can be applied to other technology beyond Kafka.

Happy Testing!

About the Author

Patrick Double profile.

Patrick Double

Principal Technologist

I have been coding since 6th grade, circa 1986, professionally (i.e. college graduate) since 1998 when I graduated from the University of Nebraska-Lincoln. Most of my career has been in web applications using JEE. I work the entire stack from user interface to database.   I especially like solving application security and high availability problems.

Leave a Reply

Your email address will not be published.

Related Blog Posts
Building Better Data Visualization Experiences: Part 1 of 2
Through direct experience with data scientists, business analysts, lab technicians, as well as other UX professionals, I have found that we need a better understanding of the people who will be using our data visualization products in order to build them. Creating a product utilizing data with the goal of providing insight is fundamentally different from a typical user-centric web experience, although traditional UX process methods can help.
Kafka Schema Evolution With Java Spring Boot and Protobuf
In this blog I will be demonstrating Kafka schema evolution with Java, Spring Boot and Protobuf.  This app is for tutorial purposes, so there will be instances where a refactor could happen. I tried to […]
Redis Bitmaps: Storing state in small places
Redis is a popular open source in-memory data store that supports all kinds of abstract data structures. In this post and in an accompanying example Java project, I am going to explore two great use […]
Let’s build a WordPress & Kernel updated AMI with Packer
First, let’s start with What is an AMI? An Amazon Machine Image (AMI) is a master image for the creation of virtual servers in an AWS environment. The machine images are like templates that are configured with […]