Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to deal with DAG lib in airflow?

I've got a little question about dependency management for packages used in python operators

We are using airflow in a industralized mode to run scheduled python jobs. it works well but we are facing issues to deal with different python lib needed for each DAG.

Do you have any idea on how to let developers install their own dependencies for their jobs without being admin and being sure that these dependencies don't collide with other jobs ?

Would you recommend having a bash task that loads a virtual env at the beginning of the job ? Any official recommandation to do it ?

Thanks ! Romain.

like image 926
romain-nio Avatar asked Jan 03 '18 14:01

romain-nio


1 Answers

In general I see two possible solutions for your problem:

  1. Airflow has a PythonVirtualEnvOperator which allows a task to run in a virtualenv which gets created and destroyed automatically. You can pass a python_version and a list of requirements to the task to build the virtual env.

  2. Set up a docker registry and use a DockerOperator rather than a PythonOperator. This would allow teams to set up their own Docker images with specific requirements. This is how I think Heineken set up their airflow jobs as presented in their Airflow Meetup. I'm trying to see whether they posted their slides online but I can't seem to find them.

like image 116
Matthijs Brouns Avatar answered Oct 19 '22 17:10

Matthijs Brouns