Regarding the features on Flink that allow to optimize resource usage in the cluster (+ latency, throughput ...), i.e. slot sharing, task chaining, async i/o and dynamic scaling, I would like to ask the following questions (all in the stream processing context): <ol> <li>In which cases would someone be interested in having the number of slots in a task manager higher than the number of cpu cores? </li> <li>In which case should we prefer split a pipeline of tasks over multiple slots (disable slot sharing), instead of increasing the parallelism, in order for an application to keep up with the incoming data rates? </li> <li>Is it possible that even when using all the features above, the resources reserved for a slot may be higher than the amount of resources that all the tasks in the slot require, thus causing us to have resources that are reserved for a slot, but not being used? Is it possible that such problems appear when we have tasks in applications with different latencies (or different parallelisms)? Or even when we are performing multiple aggregations (that cannot be optimised using folds or reduces) on the same window? </li> </ol> Thanks in advance.

<ol> <li> Usually, it is recommended to reserve for each slot at least one CPU core. One reason why you would want to reserve more slots than cores is that you execute blocking operations in your operators. That way you can keep all of your cores busy. </li> <li> If you observe that your application cannot keep up with the incoming data rate, then it is usually best to increase the parallelism (given that the bottleneck is not an operator with parallelism 1 and that your data has enough key values). If you have multiple compute intensive operators in one pipeline (maybe even chained) and you have fewer cores than these operators per slot, then it might make sense to split up the pipeline. That way the computation of these operators can be better done concurrently. </li> <li> Theoretically, it can be the case that you assign more resources to a slot than are actually needed. E.g. you have a single operator in each slot but multiple cores assigned to it. Also in case of different parallelism of operators, some slots might get more sub-tasks assigned than others. One thing you can always do is to monitor the execution of your job to detect under and over-provisioning. </li> </ol>

Resource overallocation to slots in Flink

Tags:

Regarding the features on Flink that allow to optimize resource usage in the cluster (+ latency, throughput ...), i.e. slot sharing, task chaining, async i/o and dynamic scaling, I would like to ask the following questions (all in the stream processing context):

In which cases would someone be interested in having the number of slots in a task manager higher than the number of cpu cores?
In which case should we prefer split a pipeline of tasks over multiple slots (disable slot sharing), instead of increasing the parallelism, in order for an application to keep up with the incoming data rates?
Is it possible that even when using all the features above, the resources reserved for a slot may be higher than the amount of resources that all the tasks in the slot require, thus causing us to have resources that are reserved for a slot, but not being used? Is it possible that such problems appear when we have tasks in applications with different latencies (or different parallelisms)? Or even when we are performing multiple aggregations (that cannot be optimised using folds or reduces) on the same window?

Thanks in advance.

585

asked Apr 21 '17 20:04

Luis Alves

1 Answers

Usually, it is recommended to reserve for each slot at least one CPU core. One reason why you would want to reserve more slots than cores is that you execute blocking operations in your operators. That way you can keep all of your cores busy.
If you observe that your application cannot keep up with the incoming data rate, then it is usually best to increase the parallelism (given that the bottleneck is not an operator with parallelism 1 and that your data has enough key values).

If you have multiple compute intensive operators in one pipeline (maybe even chained) and you have fewer cores than these operators per slot, then it might make sense to split up the pipeline. That way the computation of these operators can be better done concurrently.
Theoretically, it can be the case that you assign more resources to a slot than are actually needed. E.g. you have a single operator in each slot but multiple cores assigned to it. Also in case of different parallelism of operators, some slots might get more sub-tasks assigned than others. One thing you can always do is to monitor the execution of your job to detect under and over-provisioning.

146

answered Oct 11 '22 13:10

Till Rohrmann

Related questions
                            
                                What is the minimum test to verify that a component can save/retrieve UTF8 encoded strings
                            
                                How can I use text-align-last except when I have only one line of text?
                            
                                Exponentiation using list comprehension
                            
                                How use react-rails with webpacker?
                            
                                Scale video view based on the aspect ratio
                            
                                How I can inject artifact from AWS S3 inside Docker image?
                            
                                Python opencv: How to use Kalman filter
                            
                                How to speed up this SQL index query?
                            
                                Python: mock file input for testing function
                            
                                How can I search with belongsto laravel?
                            
                                How do we cast context to fragment reference?
                            
                                GKE cluster autoscaler vs Autoscaling in Managed instance groups

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With