I'm trying to get some simple rules or guidelines for what values to set for operator or job parallelism. It would seem to me that it should be a number <= the number of available task slots?
For example, suppose I have 2 task manager machines, each with 4 task slots. Assuming no other jobs running on the cluster, would I set the parallelism for operations like filter and map to 8? If not, what would be a reasonable number?
What happens if you request more parallelism than they are task slots? In example above, what happens if I set parallelism to 12 on the operations? I'm assuming it would just use as many as are available?
Also, it would seem that you would not want to hardcode the parallelism into your source code, since you would want to have a rough idea of available task slots when you submit the job? Should you set parallelism to all operators roughly the same or different values, and what would guide that decision?
Thanks!
In general it is a good idea to not hardcode the parallelism because it is usually the responsibility of operations to decide how many resources to assign to your job. Moreover, the resource requirements usually depend on your SLAs and the actual workload and are, thus, program independent and should be treated separately.
With Flink 1.5.0 when running on Yarn or Mesos, you only need to decide on the parallelism of your job and the system will make sure that it starts enough TaskManagers with enough slots to execute your job. This happens completely dynamically and you can even change the parallelism of your job at runtime.
If you are using the standalone mode or if your Yarn/Mesos cluster has not enough resources/slots available, then the job will fail with a NoResourceAvailableException
if the system cannot obtain the required slots.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With