Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there anyway I can set the working directory in airflow where my codes will run?

I am trying to help my team of data scientist run their code using airflow. The problem i faced is that their python scripts will read/write some intermediate files.

1) Is there anyway to set the working directory where their scripts and files can exist so that it will not clutter the dags folder?

2) even if i use the dag folder, I would have to specify the absolute path everytime i read/write those files. unless there is some other way around this?

i.e. i would have to do this all the time:-

absolute_path="/some/long/directory/path"    
f = os.path.join(absolute_path,file_name)
like image 396
Chirrut Imwe Avatar asked Mar 22 '19 03:03

Chirrut Imwe


2 Answers

What I do is have a separate folder with all the modules needed to be run and I add that to the airflow run environment.

PATH_MODULES = "/home/airflow-worker-1/airflow_modules/"

sys.path += [ PATH_MODULES ]

This way, I can import any functions in those folders ( provided that they have __init__.py because they are treated as packages.

airflow_modules
    |_ code_repository_1
    |_ code_repository_2
    |_ code_repository_3
       |_ file_1.py
       |_ config.py

So in your DAG code you use:

from code_repository_1.data_cleaning       import clean_1
from code_repository_2.bigquery_operations import operation_1

One thing to keep in mind is that since this is treating the repositories as projects so if you need file_1.py to import a variable from config.py, then you can have to use the relative import with from .config import variable_1.

like image 79
Meghdeep Ray Avatar answered Nov 19 '22 06:11

Meghdeep Ray


you can use the os module to do this. if you put something like this section of code at the top of your dag file:

import os
os.chdir('/home/lnx/test/')

it will change the working directory for all tasks running in the dag to /home/lnx/test so you wouldn't have to provide absolute paths. It will however need to be included at the top of every dag that requires this working directory.

Although this will be a late answer hopefully it can help someone else in this position.

like image 1
tribo32 Avatar answered Nov 19 '22 05:11

tribo32