Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Kafka -> Flink DataStream -> MongoDB

I want to setup Flink so it would transform and redirect the data streams from Apache Kafka to MongoDB. For testing purposes I'm building on top of flink-streaming-connectors.kafka example (https://github.com/apache/flink).

Kafka streams are being properly red by Flink, I can map them etc., but the problem occurs when I want to save each recieved and transformed message to MongoDB. The only example I've found about MongoDB integration is flink-mongodb-test from github. Unfortunately it uses static data source (database), not the Data Stream.

I believe there should be some DataStream.addSink implementation for MongoDB, but apparently there's not.

What would be the best way to achieve it? Do I need to write the custom sink function or maybe I'm missing something? Maybe it should be done in different way?

I'm not tied to any solution, so any suggestion would be appreciated.

Below there's an example what exactly i'm getting as an input and what I need to store as an output.

Apache Kafka Broker <-------------- "AAABBBCCCDDD" (String)
Apache Kafka Broker --------------> Flink: DataStream<String>

Flink: DataStream.map({
    return ("AAABBBCCCDDD").convertTo("A: AAA; B: BBB; C: CCC; D: DDD")
})
.rebalance()
.addSink(MongoDBSinkFunction); // store the row in MongoDB collection

As you can see in this example I'm using Flink mostly for Kafka's message stream buffering and some basic parsing.

like image 266
Michal Avatar asked Feb 02 '16 16:02

Michal


1 Answers

As an alternative to Robert Metzger answer, you can write your results again to Kafka and then use one of the maintained kafka's connectors to drop the content of a topic inside your MongoDB Database.

Kafka -> Flink -> Kafka -> Mongo/Anything

With this approach you can mantain the "at-least-once semantics" behaivour.

like image 67
diegoreico Avatar answered Oct 13 '22 09:10

diegoreico