I'm running an airflow server and worker on different AWS machines.
I've synced that dags folder between them, ran airflow initdb on both, and checked that the dag_id's are the same when I run airflow list_tasks <dag_id>
When I run the scheduler and worker, I get this error on the worker:
airflow.exceptions.AirflowException: dag_id could not be found: . Either the dag did not exist or it failed to parse. [...] Command ...--local -sd /home/ubuntu/airflow/dags/airflow_tutorial.py'
What seems to be the problem is that the path there is wrong (/home/ubuntu/airflow/dags/airflow_tutorial.py) since the correct path is /home/hadoop/...
On the server machine the path is with ubuntu, but on both config files it's simply ~/airflow/...
What makes the worker look in this path and not the correct one?
How do I tell it to look in it's own home dir?
edit:
grep -R ubuntu and the only occurrences are in the logsubuntu as a user everything works. Which leads me to believe that for some reason airflow provides the worker with the full path of the taskAdding --raw parameter to the airflow run command helped me to see what was the original exception. In my case, the metadata database instance was too slow, and loading dags failed because of a timeout. I've fixed it by:
dagbag_import_timeout in airflow.cfgHope this helps!
I'm experiencing the same thing - the worker process appears to pass an --sd argument corresponding to the dags folder on the scheduler machine, not on the worker machine (even if dags_folder is set correctly in the airflow config file on the worker). In my case I was able to get things working by creating a symlink on the scheduler host such that dags_folder can be set to the same value. (In your example, this would mean creating a symlink /home/hadoop -> /home/ubuntu on the scheduler machine, and then settings dags_folder to /home/hadoop). So, this is not really an answer to the problem but it is a viable workaround in some cases.
Have you tried setting the dags_folder parameter in config file to point explicitly to the /home/hadoop/ i.e. the desired path?
This parameter controls the location to look for dags
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With