Apache Flink: Using filter() or split() to split a stream?

Question

I have a DataStream from Kafka which has 2 possible value for a field in MyModel. MyModel is a pojo with domain-specific fields parsed from a message from Kafka.

DataStream<MyModel> stream = env.addSource(myKafkaConsumer);

I want to apply window and operators on each key a1, a2 separately. What is a good way to separate them? I have 2 options filter and select in mind but don't know which one is faster.

Filter approach

stream
        .filter(<MyModel.a == a1>)
        .keyBy()
        .window()
        .apply()
        .addSink()

stream
        .filter(<MyModel.a == a2>)
        .keyBy()
        .window()
        .apply()
        .addSink()

Split and select approach

SplitStream<MyModel> split = stream.split(…)
    split
        .select(<MyModel.a == a1>)
        …
        .addSink()

    split
        .select<MyModel.a == a2>()
        …
        .addSink()

If split and select are better, how to implement them if I want to split based on the value of a field in MyModel?

Fabian Hueske · Accepted Answer

Both methods behave pretty much the same. Internally, the split() operator forks the stream and applies filters as well.

There is a third option, Side Outputs . Side outputs might have some benefits, such as different output data types. Moreover, the filter condition is just evaluated once for side outputs.

Monika X · Answer

SplitStreams and split method in DataStream are deprecated since Flink Deprecated List 1.6. It is no longer recommended to be used.

Apache Flink: Using filter() or split() to split a stream?

Tags:

apache-flink

flink-streaming

Son

2 Answers

Fabian Hueske

Monika X

Recent Activity

Donate For Us

Apache Flink: Using filter() or split() to split a stream?

Tags:

apache-flink

flink-streaming

Son

2 Answers

Fabian Hueske

Monika X

Related questions

Recent Activity

Donate For Us