Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Difference between job, task and subtask in flink

Tags:

apache-flink

I'm new to flink and try to understand:

  1. job
  2. task
  3. subtask

I searched in the docs but still did not get it. What's the main diffence between them?

like image 408
xingbin Avatar asked Dec 04 '18 10:12

xingbin


People also ask

What is job in Flink?

A Flink job is first in the created state, then switches to running and upon completion of all work it switches to finished. In case of failures, a job switches first to failing where it cancels all running tasks.

What is task slot in Flink?

Task Slots and ResourcesTo control how many tasks a worker accepts, a worker has so called task slots (at least one). Each task slot represents a fixed subset of resources of the TaskManager. A TaskManager with three slots, for example, will dedicate 1/3 of its managed memory to each slot.

What is the difference between tasks and subtasks in Jira?

A task represents work that needs to be done. A subtask is a piece of work that is required to complete a task. Subtasks issues can be used to break down any of your standard issues in Jira (bugs, stories or tasks).

Can tasks have subtasks?

But sometimes, a task has multiple components, or multiple contributors. You can't add another assignee to the same task—but you can create subtasks. Subtasks can be a powerful way to distribute work and split tasks into individual components—while staying connected to the overarching context of the parent task.


1 Answers

Tasks and sub-tasks are explained here -- https://ci.apache.org/projects/flink/flink-docs-release-1.7/concepts/runtime.html#tasks-and-operator-chains:

enter image description here

A task is an abstraction representing a chain of operators that could be executed in a single thread. Something like a keyBy (which causes a network shuffle to partition the stream by some key) or a change in the parallelism of the pipeline will break the chaining and force operators into separate tasks. In the diagram above, the application has three tasks.

A subtask is one parallel slice of a task. This is the schedulable, runable unit of execution. In the diagram above, the application is to be run with a parallelism of two for the source/map and keyBy/Window/apply tasks, and a parallelism of one for the sink -- resulting in a total of 5 subtasks.

A job is a running instance of an application. Clients submit jobs to the jobmanager, which slices them into subtasks and schedules those subtasks for execution by the taskmanagers.

Update:

The community decided to re-align the definitions of task and sub-task to match how these terms are used in the code -- which means that task and sub-task now mean the same thing: exactly one parallel instance of an operator or operator chain. See the Glossary for more details.

like image 55
David Anderson Avatar answered Nov 01 '22 10:11

David Anderson