Kafka is an exciting space and what I have been finding most exciting is the data streaming aspect of it. Kafka Connectors are the bookends to a powerful streaming engine that quickly transform and analyze data at scale. Source Connectors pull data from data sources and pump it into Kafka topics while Sink Connectors push data out of Kafka topics to a new data source or file system (like HDFS) for deeper analysis. The Kafka Community has an open source mindset which has led to a large library of existing Connectors that engineers have built and made available. If you’re thinking of utilizing a data source (e.g. S3 or Elasticsearch) as a source or sink, the odds are there is already a connector built.
If you do end up having to build your own Connector, don’t fret! Confluent, a leader in the Kafka space, has put together a nice developer guide to help you along the way. One thing that I noticed during the journey of building my first Connector was that Maven was the most widely used build tool in examples and documentation. That didn’t sit well with me being that Gradle has become my defacto build tool (and for good reason) so I’m going to show you the Gradle build that I ended with to package up my Connector. I’ll also touch briefly on how I get the artifact onto the Connect docker container that I run locally for testing.
Before getting to the build script I want to point out that there are two main approaches to installing a Kafka Connect plugin. The first is to build a single uber JAR containing all of the classfiles for the plugin and its third-party dependencies. The second is to create a directory on the file system that contains the JAR files for the plugin and its third-party dependencies. I went with the first approach because it felt cleaner and easier to distribute when scaling out the Connect workers.
build.gradle snippet below should make sense to those familiar with Gradle but there are a few things to note.
Once the build is configured, generating the uber JAR is as simple as running
./gradlew clean shadowJar which will place the jar in the
As mentioned earlier, the Kafka Connect service has a couple of ways to install a plugin. The easiest way that I found was to dump the artifact into the
/usr/share/java directory where Kafka is already looking for preconfigured plugins. To do this I mounted my
./build/libs directory to the existing plugin directory on the docker container and created a new folder for my plugin. Keep in mind that Kafka Connect will only pick up folders prefixed with “kafka-connect”. Below is a snippet showing the volume mount I added to my docker-compose file. I have excluded the rest of the Connect service’s configuration but you can find the full file that I forked in Confluent’s docker examples.