Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the difference between periodic and punctuated watermarks in Apache Flink?

Will be helpful if someone give usecase example to explain the difference between each of the Watermark API with Apache flink given below

  • Periodic watermarks - AssignerWithPeriodicWatermarks[T]
  • Punctuated Watermarks - AssignerWithPunctuatedWatermarks[T]
like image 563
Somasundaram Sekar Avatar asked Jan 23 '17 14:01

Somasundaram Sekar


People also ask

What are watermarks in Flink?

Watermark is a method to measure the progress of the event time. With event time, every input event has an embedded timestamp. This timestamp can be used for watermarks to indicate the time of incoming events to the operator.

What is backpressure in Flink?

Backpressure is an indicator that your machines or operators are overloaded. The buildup of backpressure directly affects the end-to-end latency of the system, as records are waiting longer in the queues before being processed.

Is Flink an open source?

Apache Flink is an open-source, unified stream-processing and batch-processing framework developed by the Apache Software Foundation. The core of Apache Flink is a distributed streaming data-flow engine written in Java and Scala.

What is Flink application?

Flink is a versatile processing framework that can handle any kind of stream. Bounded and unbounded streams: Streams can be unbounded or bounded, i.e., fixed-sized data sets. Flink has sophisticated features to process unbounded streams, but also dedicated operators to efficiently process bounded streams.


1 Answers

The main difference between the two types of watermark is how/when the getWatermark method is called.

periodic watermark

With periodic watermarks, Flink calls getCurrentWatermark() at regular interval, independently of the stream of events. This interval is defined using

ExecutionConfig.setAutoWatermarkInterval(millis)

Use this class when your watermarks depend (even partially) on the processing time, or when you need watermarks to be emitted even when no event/elements has been received for a while.

punctuated watermarks

With punctuated watermarks, Flink calls checkAndGetWatermark() on each new event, i.e. right after calling assignWatermark(). An actual watermark is emitted only if checkAndGetWatermark returns a non-null value which is greater than the last watermark.

This means that if you don't receive any new element for a while, no watermark can be emitted.

Use this class if certain special elements act as markers that signify event time progress, and when you want to emit watermarks specifically at certain events. For example, you could have flags in your incoming stream marking the end of a sequence.

like image 92
Derlin Avatar answered Sep 21 '22 08:09

Derlin