Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Kafka Streams Application Updates

I've built a Kafka Streams application. It's my first one, so I'm moving out of a proof-of-concept mindset into a "how can I productionalize this?" mindset.

The tl;dr version: I'm looking for kafka streams deployment recommendations and tips, specifically related to updating your application code.

I've been able to find lots of documentation about how Kafka and the Streams API work, but I couldn't find anything on actually deploying a Streams app.

The initial deployment seems to be fairly easy - there is good documentation for configuring your Kafka cluster, then you must create topics for your application, and then you're pretty much fine to start it up and publish data for it to process.

But what if you want to upgrade your application later? Specifically, if the update contains a change to the topology. My application does a decent amount of data enrichment and aggregation into windows, so it's likely that the processing will need to be tweaked in the future.

My understanding is that changing the order of processing or inserting additional steps into the topology will cause the internal ids for each processing step to shift, meaning at best new state stores will be created with the previous state being lost, and at worst, processing steps reading from an incorrect state store topic when starting up. This implies that you either have to reset the application, or give the new version a new application id. But there are some problems with that:

  1. If you reset the application or give a new id, processing will start from the beginning of source and intermediate topics. I really don't want to publish the output to the output topics twice.
  2. Currently "in-flight" data would be lost when you stop your application for an upgrade (since that application would never start again to resume processing).

The only way I can think to mitigate this is to:

  1. Stop data from being published to source topics. Let the application process all messages, then shut it off.
  2. Truncate all source and intermediate topics.
  3. Start new version of application with a new app id.
  4. Start publishers.

This is "okay" for now since my application is the only one reading from the source topics, and intermediate topics are not currently used beyond feeding to the next processor in the same application. But, I can see this getting pretty messy.

Is there a better way to handle application updates? Or are my steps generally along the lines of what most developers do?

like image 645
shaddow Avatar asked Nov 08 '22 10:11

shaddow


1 Answers

I think you have a full picture of the problem here and your solution seems to be what most people do in this case.

During the latest Kafka-Summit this question has been asked after the talk of Gwen Shapira and Matthias J. Sax about Kubernetes deployment. The responses were the same: If your upgrade contains topology modifications, that implies rolling upgrades can't be done.

It looks like there is no KIP about this for now.

like image 182
Loicmdivad Avatar answered Nov 15 '22 11:11

Loicmdivad