Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Do airflow workers share the same file system ? or are they isolated

Tags:

airflow

I have a task in airflow which downloads a file from GitHub to the local file system. passes it to spark-submit and then deletes it. I wanted to know if this will create any issues.

Can this be possible that both the workers that are running the same task concurrently on two different dag runs are referencing the same file?

Sample code -->

def python_task_callback():
    download_file(file_name='script.py')
    spark_submit(path='/temp/script.py')
    delete_file(path='/temp/script.py')
like image 632
Shivansh Narayan Avatar asked Nov 20 '25 07:11

Shivansh Narayan


1 Answers

For your use case if you do all of the actions you mentioned (download, parse, delete) in a single task then you will have no problems regardless of which executor you are running.

If you are splitting the actions between several tasks then you should use a shared file system like S3, Google Storage etc. In that case it will also work regardless of which executor youa re using. A possible workflow can be:

1st task: copy file from github to S3

2nd task: submit the file to processing

3rd task: delete the file from S3


As for your general question if tasks share disk - that depends on the executor that you are using.

In Local Executor you have only 1 worker thus all tasks run on the same machine and share it's disk.

In Celery Executor/ Kubernetes Executor/others tasks may run on different workers.

However as mentioned - don't assume that tasks share disk, if you will need to scale up the executor from Local to Celery you don't want to find yourself in a case where you need to refactor your code.

like image 52
Elad Kalif Avatar answered Nov 21 '25 21:11

Elad Kalif



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!