I have a DataStream from Kafka which has 2 possible value for a field in MyModel. MyModel is a pojo with domain-specific fields parsed from a message from Kafka.
DataStream<MyModel> stream = env.addSource(myKafkaConsumer);
I want to apply window and operators on each key a1, a2 separately. What is a good way to separate them? I have 2 options filter and select in mind but don't know which one is faster.
Filter approach
stream
.filter(<MyModel.a == a1>)
.keyBy()
.window()
.apply()
.addSink()
stream
.filter(<MyModel.a == a2>)
.keyBy()
.window()
.apply()
.addSink()
Split and select approach
SplitStream<MyModel> split = stream.split(…)
split
.select(<MyModel.a == a1>)
…
.addSink()
split
.select<MyModel.a == a2>()
…
.addSink()
If split and select are better, how to implement them if I want to split based on the value of a field in MyModel?
Both methods behave pretty much the same. Internally, the split()
operator forks the stream and applies filters as well.
There is a third option, Side Outputs . Side outputs might have some benefits, such as different output data types. Moreover, the filter condition is just evaluated once for side outputs.
SplitStreams and split method in DataStream are deprecated since Flink Deprecated List 1.6. It is no longer recommended to be used.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With