Apache Flink Process Stream Multiple Times

Question

I am trying to use Apache Flink to process a data stream using two different algorithms. My pseudo code is as follows:

env = getEnvironment();
DataStream<Event> inputStream = getInputStream();
// How to replicate the input stream?
Array[DataStream<Event>] inputStreams = inputStream.clone()

// apply different operations on the replicated streams
outputOne = inputStreams[0].map(func1);
outputTwo = inputStreams[1].map(func2);
 ...
outputOne.addSink(sink1);
outputTwo.addSink(sink2);
env.execute();

I did some research with Flink documentation. It seems there is no concept of cloning a stream. Neither DataStream.iterate() nor DataStream.split() are doing exactly what I want. Is there an alternative to creating a stream multiple times from its source? Thank you for your help.

Fabian Hueske · Accepted Answer

"Cloning" a stream is quite simple and does not require a dedicated operator. You can just apply multiple transformation on the same DataStream. All downstream transformations will consume the complete stream.

So in your example you do:

env = getEnvironment();
DataStream<Event> inputStream = getInputStream();

outputOne = inputStream.map(func1); // apply 1st transformation
outputTwo = inputStream.map(func2); // apply 2nd transformation
...
outputOne.addSink(sink1);
outputTwo.addSink(sink2);
env.execute();

Apache Flink Process Stream Multiple Times

Tags:

java

apache-flink

Wei Ma

1 Answers

Fabian Hueske

Recent Activity

Donate For Us

Apache Flink Process Stream Multiple Times

Tags:

java

apache-flink

Wei Ma

1 Answers

Fabian Hueske

Related questions

Recent Activity

Donate For Us