Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does kafka streams compute watermarks?

Does Kafka Streams internally compute watermarks? Is it possible to observe the result of a window (only) when it completes (i.e. when watermark passes end of window)?

like image 536
arunmahadevan Avatar asked Feb 08 '19 01:02

arunmahadevan


People also ask

Should Kafka Streams have watermarks or triggers?

Back in May 2017, we laid out why we believe that Kafka Streams is better off without a concept of watermarks or triggers, and instead opts for a continuous refinement model. This article explains how we are fundamentally sticking with this model, while also opening the door for use cases that are incompatible with continuous refinement.

What is the difference between Kaka consumer and Kaka streams?

Kafka Streams supports stateless and stateful operations, but Kaka Consumer only supports stateless operations. Kafka Consumer offers you the capability to write in several Kafka Clusters, whereas Kafka Streams lets you interact with a single Kafka Cluster only. Here are the steps you can follow to connect Kafka Streams to Confluent Cloud:

What is the difference between kstream and ktable in Kafka?

Kafka Streams provides two abstractions for Streams and Tables. KStream handles the stream of records. On the other hand, KTable manages the changelog stream with the latest state of a given key. Each data record represents an update. There is another abstraction for not partitioned tables.

Where does my Kafka event stream data come from?

Your event stream data comes in from Kafka through the source nodes at the top of the topology, flows through the user processor nodes where custom-logic operations are performed, and exits through the sink nodes to a new Kafka topic.


2 Answers

Kafka Streams does not use watermarks internally, but a new feature in 2.1.0 lets you observe the result of a window when it closed. It's called Suppressed, and you can read about it in the docs: Window Final Results:

KGroupedStream<UserId, Event> grouped = ...;
grouped
    .windowedBy(TimeWindows.of(Duration.ofHours(1)).grace(ofMinutes(10)))
    .count()
    .suppress(Suppressed.untilWindowCloses(unbounded()))
like image 93
Dmitry Minkovsky Avatar answered Sep 19 '22 03:09

Dmitry Minkovsky


Does Kafka Streams internally compute watermarks?

No. Kafka Streams follows a continuous update processing model that does not require watermarks. You can find more details about this online:

  • https://www.confluent.io/blog/watermarks-tables-event-time-dataflow-model/
  • https://www.confluent.io/resources/streams-tables-two-sides-same-coin

Is it possible to observe the result of a window (only) when it completes (i.e. when watermark passes end of window)?

You can observe of result of a window at any point in time. Either via subscribing to the result changelog stream via for example KTable#toStream()#foreach() (ie, a push based approach), or via Interactive Queries that let you query the result window actively (ie, a pull based approach).

As mentioned by @Dmitry, for the push based approach, you can also use suppress() operator if you are only interested in a window's final result.

like image 40
Matthias J. Sax Avatar answered Sep 20 '22 03:09

Matthias J. Sax