Kafka - How to use filter and filternot at the same time?

Tags:

I have a Kafka stream that takes data from a topic, and needs to filter that information to two different topics.

KStream<String, Model> stream = builder.stream(Serdes.String(), specificAvroSerde, "not-filtered-topic");
stream.filter((key, value) -> new Processor().test(key, value)).to(Serdes.String(), specificAvroSerde, "good-topic");
stream.filterNot((key, value) -> new Processor().test(key, value)).to(Serdes.String(), specificAvroSerde, "bad-topic");

However, when I do it like this, it reads the data from the topic twice -- not sure if that has any impact on performance as the data gets larger. Is there a way to just filter it once and push it to two topics?

709

asked Dec 01 '16 18:12

m1771vw

1 Answers

Your approach is correct and data is not read twice from the topic and there is also no internal data-replication going on. The only disadvantage of your approach is, that both filter predicates are evaluated for each record -- however, this is quite cheap and should not be a performance issues.

However, you could still improve performance by using KStream#branch() that does take multiple predicates and evaluates all predicates after each other and returns one input stream for each predicate. If a record matches a predicate, it is put into the corresponding output stream and the evaluation stops (i.e., not further predicate is evaluated for this single record -- this ensure that each record is added to max one output stream; or is dropped if no predicate matches).

Thus, you can just provide two predicate to branch(): the first one is the same as your original filter() predicate and the second predicate always returns true.

KStream<String, Model> stream = builder.stream(
    Serdes.String(),
    specificAvroSerde,
    "not-filtered-topic"
);
KStream[] splitStreams = stream.branch(
    (key, value) -> new Processor().test(key,value),
    (key, value) -> true
);
splitStreams[0].to(Serdes.String(), specificAvroSerde, "good-topic");
splitStreams[1].to(Serdes.String(), specificAvroSerde, "bad-topic");

Not sure if this code is better readable than your original version though. I guess it's a matter of taste and I personally like your original code better, because it does express semantics better.

The version I added, should be slightly more CPU efficient, as for all records that do satisfy the predicate it is only evaluated once. And for all records that do not satisfy the result, a simple true will be return (i.e., no second predicate evaluation).

If you know that most records will end up in splitStream[1], you could also invert the predicate (and use splitStream[0] as "bad-stream") to decrease the number of calls to the second true-returning predicate. But those are only micro-optimizations and should not matter.

167

answered Oct 05 '22 23:10

Matthias J. Sax

Related questions
                            
                                vert.x: How do you correctly send a post request?
                            
                                Why is ReversedLinesFileReader so slow?
                            
                                How to confirm delivery of Message to Amazon SQS Queue?
                            
                                Jackson's @JsonTypeInfo(use = Id.CUSTOM, include = As.PROPERTY, property = "type") reads all fields of JSON except for "type"
                            
                                Same Image, but different base64
                            
                                Eclipse Neon eGit Integration gives Exception 401 Authorization Required
                            
                                How to code the hierarchical relationship to the node of the same type properly in spring data neo4j?
                            
                                Hibernate 2nd level cache one-to-one doesn't work
                            
                                Java File.listFiles() returns files that do 'not exist' according to `exists()`
                            
                                Force spring boot jackson deserializer to use BigDecimal
                            
                                Java Comparable: helper methods for isLessThan, isGreaterThan, isEqualTo
                            
                                Spring Boot 1.4.1 and Cassandra 3.x
                            
                                How can I use Collectors instead of manually putting into ConcurrentHashMap in java 8
                            
                                Force spring data rest to use https scheme
                            
                                DH key size must be multiple of 64, and can only range from 512 to 2048 (inclusive)
                            
                                Android Studio : "Could not get unknown property 'VERSION_NAME' for project of type org.gradle.api.Project"
                            
                                Markov Chain: Finding terminal state calculation
                            
                                Symbols resolution in standalone IntelliJ parser
                            
                                Spring Conditional Annotation Operator
                            
                                What is the need of convert a String to charArray?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Kafka - How to use filter and filternot at the same time?

Tags:

java

apache-kafka

apache-kafka-streams

m1771vw

People also ask

1 Answers

Matthias J. Sax

Recent Activity

Donate For Us