I'm running an airflow server and worker on different AWS machines.
I've synced that dags folder between them, ran airflow initdb
on both, and checked that the dag_id's are the same when I run airflow list_tasks <dag_id>
When I run the scheduler and worker, I get this error on the worker:
airflow.exceptions.AirflowException: dag_id could not be found: . Either the dag did not exist or it failed to parse. [...] Command ...--local -sd /home/ubuntu/airflow/dags/airflow_tutorial.py'
What seems to be the problem is that the path there is wrong (/home/ubuntu/airflow/dags/airflow_tutorial.py) since the correct path is /home/hadoop/...
On the server machine the path is with ubuntu, but on both config files it's simply ~/airflow/...
What makes the worker look in this path and not the correct one?
How do I tell it to look in it's own home dir?
edit:
grep -R ubuntu
and the only occurrences are in the logsubuntu
as a user everything works. Which leads me to believe that for some reason airflow provides the worker with the full path of the taskAdding --raw
parameter to the airflow run
command helped me to see what was the original exception. In my case, the metadata database instance was too slow, and loading dags failed because of a timeout. I've fixed it by:
dagbag_import_timeout
in airflow.cfgHope this helps!
I'm experiencing the same thing - the worker process appears to pass an --sd
argument corresponding to the dags folder on the scheduler machine, not on the worker machine (even if dags_folder
is set correctly in the airflow config file on the worker). In my case I was able to get things working by creating a symlink on the scheduler host such that dags_folder
can be set to the same value. (In your example, this would mean creating a symlink /home/hadoop -> /home/ubuntu on the scheduler machine, and then settings dags_folder to /home/hadoop). So, this is not really an answer to the problem but it is a viable workaround in some cases.
Have you tried setting the dags_folder parameter in config file to point explicitly to the /home/hadoop/ i.e. the desired path?
This parameter controls the location to look for dags
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With