Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does dataflow trigger AfterProcessingTime.pastFirstElementInPane() work?

In the Dataflow streaming world.

My understanding when I say:

Window.into(FixedWindows.of(Duration.standardHours(1)))
  .triggering(AfterProcessingTime.pastFirstElementInPane()
      .plusDelayOf(Duration.standardMinutes(15))

is that for a fixed window of one hour, the trigger waits or batches the elements after it has seen the first element.

But when I say:

Window.into(FixedWindows.of(Duration.standardHours(1)))
  .triggering(AfterProcessingTime.pastFirstElementInPane()

Does it fire every time from the first time it sees the first element or does it implicitly batch elements? because firing on every element overloads the system.

like image 490
Anil Muppalla Avatar asked May 10 '17 19:05

Anil Muppalla


1 Answers

With both of those triggers, the window will be fired once, and any remaining elements will be discarded. You can use Repeatedly.forever(...) to trigger multiple times.

Regarding your specific question, there is a small amount of batching that happens if elements arrive around the same time.

Assuming you meant the following, then yes, the second one will trigger much more often, and may overload the system.

Repeatedly.forever(AfterProcessingTime.pastFirstElementInPane()
    .plusDelayOf(Duration.standardMinutes(15)))

vs.

Repeatedly.forever(AfterProcessingTime.pastFirstElementInPane())
like image 112
Ben Chambers Avatar answered Nov 05 '22 13:11

Ben Chambers