In airflow.cfg
there is a section called [operators]
, where default_cpus
was set to 1
and default_ram
and default_disk
were both set to 512
.
I would like to understand whether will I get improvements in processing speed if I increase these parameters or not.
I took a look at the sources and these settings are available to all operators, but they are never used, neither by operators nor by any executor.
So I went a little bit back into history and had a look at the commit that introduced those settings and they are, quoting the JIRA ticket that lead to that PR:
optional resource requirements for use with resource managers such as yarn and mesos
The Mesos executor, however, is a community contribution that does not leverage this properties and just assigns the same amount of resources to every task, and the YARN executor is not there yet AFAIK (as of version 1.9).
I once had a discussion with the Airflow team to understand if there was a way to assign resources on a per task basis using the Mesos executor and they replied me with their strategy to assign resources to tasks using the Celery executor, in case it may be of help to you to understand how to manage resources.
Regarding the core question that you are asking in a more general sense, the kind of throughput that you can get out of a task in relation with the resources it gets assigned, depends a lot on the task itself: of course a very compute-intensive task that can leverage multiple processors will see speed bumps if you assign it multiple cores, while an I/O intensive task (like copying data between different systems) will probably not see much improvement.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With