Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Apache Flink: guideliness for setting parallelism?

I'm trying to get some simple rules or guidelines for what values to set for operator or job parallelism. It would seem to me that it should be a number <= the number of available task slots?

For example, suppose I have 2 task manager machines, each with 4 task slots. Assuming no other jobs running on the cluster, would I set the parallelism for operations like filter and map to 8? If not, what would be a reasonable number?

What happens if you request more parallelism than they are task slots? In example above, what happens if I set parallelism to 12 on the operations? I'm assuming it would just use as many as are available?

Also, it would seem that you would not want to hardcode the parallelism into your source code, since you would want to have a rough idea of available task slots when you submit the job? Should you set parallelism to all operators roughly the same or different values, and what would guide that decision?

Thanks!

like image 986
ChrisRTech Avatar asked Jun 06 '18 11:06

ChrisRTech


1 Answers

In general it is a good idea to not hardcode the parallelism because it is usually the responsibility of operations to decide how many resources to assign to your job. Moreover, the resource requirements usually depend on your SLAs and the actual workload and are, thus, program independent and should be treated separately.

With Flink 1.5.0 when running on Yarn or Mesos, you only need to decide on the parallelism of your job and the system will make sure that it starts enough TaskManagers with enough slots to execute your job. This happens completely dynamically and you can even change the parallelism of your job at runtime.

If you are using the standalone mode or if your Yarn/Mesos cluster has not enough resources/slots available, then the job will fail with a NoResourceAvailableException if the system cannot obtain the required slots.

like image 78
Till Rohrmann Avatar answered Sep 27 '22 19:09

Till Rohrmann