Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Kafka Streams: Punctuate vs Process

In a single task within the stream app, does the following two methods run independently (meaning while the method "process" is handling an incoming message from the upstream source, the method "punctuate" can also run in parallel based on the specified schedule and WALL_CLOCK_TIME as the PunctuationType?) OR do they share same thread so it's either one that runs at a given time, if so would the punctuate method never gets invoked if the process method keeps continuously getting messages from the upstream source?

  • Processor.process(K key, V value)
    Process the record with the given key and value.

  • ProcessorContext.schedule(long interval, PunctuationType type, Punctuator callback)
    Schedules a periodic operation for processors.

Also, please clarify what does it mean by partition id value being -1 in punctuate method. Is punctuate method not specific to any partition?

  • int ProcessorContext.partition()
    Returns the partition id of the current input record; could be -1 if it is not available (for example, if this method is invoked from the punctuate call)
like image 793
Raman Avatar asked Jun 09 '18 17:06

Raman


1 Answers

Both methods are executed in a single thread. Wall-clock based punctuate() will be called independently if there is input data or not: Between calls to process() the thread checks the system time and calls punctuate() if necessary.

For the partition information: yes, punctuations are independent of partitions. Of course, punctuations are specific to a task, however, a task might have multiple input partitions (for example, if it executes a merge or join) so it's unclear what partition information to pass in. For simplicity, single partition case is treated the same way as multi-partition case and punctuations are decouples from partitions.

like image 151
Matthias J. Sax Avatar answered Sep 18 '22 20:09

Matthias J. Sax