Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can I increase the processing speed by adding more cpus to operators in Airflow?

In airflow.cfg there is a section called [operators], where default_cpus was set to 1 and default_ram and default_disk were both set to 512.

I would like to understand whether will I get improvements in processing speed if I increase these parameters or not.

like image 679
V.Yan Avatar asked Jan 12 '18 03:01

V.Yan


1 Answers

I took a look at the sources and these settings are available to all operators, but they are never used, neither by operators nor by any executor.

So I went a little bit back into history and had a look at the commit that introduced those settings and they are, quoting the JIRA ticket that lead to that PR:

optional resource requirements for use with resource managers such as yarn and mesos

The Mesos executor, however, is a community contribution that does not leverage this properties and just assigns the same amount of resources to every task, and the YARN executor is not there yet AFAIK (as of version 1.9).

I once had a discussion with the Airflow team to understand if there was a way to assign resources on a per task basis using the Mesos executor and they replied me with their strategy to assign resources to tasks using the Celery executor, in case it may be of help to you to understand how to manage resources.

Regarding the core question that you are asking in a more general sense, the kind of throughput that you can get out of a task in relation with the resources it gets assigned, depends a lot on the task itself: of course a very compute-intensive task that can leverage multiple processors will see speed bumps if you assign it multiple cores, while an I/O intensive task (like copying data between different systems) will probably not see much improvement.

like image 156
stefanobaghino Avatar answered Sep 21 '22 00:09

stefanobaghino