I am trying to use Apache Flink to process a data stream using two different algorithms. My pseudo code is as follows:
env = getEnvironment();
DataStream<Event> inputStream = getInputStream();
// How to replicate the input stream?
Array[DataStream<Event>] inputStreams = inputStream.clone()
// apply different operations on the replicated streams
outputOne = inputStreams[0].map(func1);
outputTwo = inputStreams[1].map(func2);
...
outputOne.addSink(sink1);
outputTwo.addSink(sink2);
env.execute();
I did some research with Flink documentation. It seems there is no concept of cloning a stream. Neither DataStream.iterate() nor DataStream.split() are doing exactly what I want. Is there an alternative to creating a stream multiple times from its source? Thank you for your help.
"Cloning" a stream is quite simple and does not require a dedicated operator. You can just apply multiple transformation on the same DataStream. All downstream transformations will consume the complete stream.
So in your example you do:
env = getEnvironment();
DataStream<Event> inputStream = getInputStream();
outputOne = inputStream.map(func1); // apply 1st transformation
outputTwo = inputStream.map(func2); // apply 2nd transformation
...
outputOne.addSink(sink1);
outputTwo.addSink(sink2);
env.execute();
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With