Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What do terms like Hash, Forward mean in the Flink plan?

This is an image of the Flink plan that appears on the dashboard when I deploy my job. As you can see, the connections between operators are marked as FORWARD/HASH etc. What do they refer to? When is something called a HASH and when is something called a FORWARD?

enter image description here

like image 274
Harshith Bolar Avatar asked Jan 25 '19 07:01

Harshith Bolar


People also ask

What does KeyBy do in Flink?

According to the Apache Flink documentation, KeyBy transformation logically partitions a stream into disjoint partitions. All records with the same key are assigned to the same partition.

What are operators in Flink?

An Apache Flink operator transforms one or more data streams into a new data stream. The new data stream contains modified data from the original data stream. Apache Flink provides more than 25 pre-built stream processing operators. For more information, see Operators in the Apache Flink Documentation .

What is keyed stream in Flink?

Using keyed streams - Flink TutorialFlink distributes the events in a data stream to different task slots based on the key. Flink users are hashing algorithms to divide the stream by partitions based on the number of slots allocated to the job. It then distributes the same keys to the same slots.

What is chaining in Flink?

For distributed execution, Flink chains operator subtasks together into tasks. Each task is executed by one thread. Chaining operators together into tasks is a useful optimization: it reduces the overhead of thread-to-thread handover and buffering, and increases overall throughput while decreasing latency.


1 Answers

First of all, as we know, a Flink streaming job will be splitted into several tasks according to its job graph(or DAG). The FORWARD/HASH is a partitioner between the upstream tasks and downstream tasks, which is used to partition data from the input.

What is Forward? And When does Forward occur?

This means the partitioner will forwards elements only to the locally running downstream tasks. Forward is the default partitioner if you don't specify any partitioner directly or use the functions with partitioner like reblance/keyBy.

What is Hash? And When does Hash occur?

This is a partitioner that partition the records based on the key group index. It occurs when you call keyBy.

like image 64
Jiayi Liao Avatar answered Oct 24 '22 15:10

Jiayi Liao