How to use Futures with Kafka Streams

Tags:

Have a kafka cluster from which I consuming two topics and join it. With result of join I do some manipulation with database. All operations to DB is async, so they return me a Future (scala.concurrent.Future, but anyway its the same as java.util.concurrent.CompletableFuture). So as a result I got code like this:

val firstSource: KTable[String, Obj]
val secondSource: KTable[String, Obj2]

def enrich(data: ObjAndObj2): Future[EnrichedObj]
def saveResultToStorage(enrichedData: Future[EnrichedObj]): Future[Unit]

firstSource.leftJoin(secondSource, joinFunc)
           .mapValues(enrich)
           .foreach(saveResultToStorage)

Is it okay that I manupulate with future values in stream or there are better ways how to handle async tasks (like .mapAsync in Akka streams)?

461

asked Feb 15 '17 10:02

Arthur Kushka

1 Answers

I have this same issue. From what I can tell, Kafka Streams is not designed to handle multi-rate streaming the same way Akka Streams is. Kafka Streams has no equivalent of the multi-rate primitives Akka has like mapAsync, throttle, conflate, buffer, batch, etc. Kafka Streams is good at handling joins between topics and stateful aggregations of data. Akka Streams is good at multi-rate and asynchronous processing.

You have a couple options how to handle this:

Make a blocking call in the Kafka Streams app. This is the easiest, and is fine if the throughput of your Future calls is not much greater than their latency. Kafka Streams uses separate threads per partition, so you can use the partitioning of the Kafka topic(s) being processed to drive parallelism.
Handle the enrichment in Akka Streams using the Reactive Kafka library, publish the enriched result to another Kafka Topic which you then bring into your Kafka Streams application. This is what we do for cases where the async call has a much faster parallel throughput than end-to-end latency such as a web service call or a query to a NoSQL database.
Publish all your enrichment data to its own KTable and join it in the Kafka Streams app. In fact, joining stream data with enrichment data via KTables is what Kafka Streams is good at. We use this if the enrichment data can be represented as a table. It does not work if the enrichment data must be computed on the fly.

105

answered Sep 21 '22 19:09

Charles Crain

Related questions
                            
                                Postman gives error for REST based POST methods
                            
                                Scala Future - for comprehension, mix sync and async
                            
                                Consume TCP stream and redirect it to another Sink (with Akka Streams)
                            
                                Unable to create dataframe from RDD of Row using case class
                            
                                mapping over zipped HLists
                            
                                How does node know which nodes have seen the cluster current state?
                            
                                how to configure a Akka Pub/Sub to run on same machine?
                            
                                Spark 2.0 Dataset Encoder with trait
                            
                                How to add a condition count > 8 in Gatling script?
                            
                                Is this a bug of scala's specialized?
                            
                                cast schema of a data frame in Spark and Scala
                            
                                Scala & Spark: Dataframe.write._ on Windows
                            
                                ERROR ContextCleaner: Error in cleaning thread
                            
                                Adding Spark "Library" to a Scala project
                            
                                How to get values from query result in slick
                            
                                Kafka Streams 0.10.1 "Failed to flush state store"
                            
                                Akka flow for multiple http requests
                            
                                Monads as Monoids in practice
                            
                                Play framework 2.5 logs `?` question marks instead of line numbers
                            
                                What type to use to store an in-memory mutable data table in Scala?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to use Futures with Kafka Streams

Tags:

stream

scala

apache-kafka-streams

Arthur Kushka

People also ask

1 Answers

Charles Crain

Recent Activity

Donate For Us