Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Airflow DAG "seems to be existing only locally. The master scheduler doesn't seem to be aware of its existence"

Tags:

airflow

I used airflow for workflow of Spark jobs. After installation, I copy the DAG files into DAGs folder set in airflow.cfg. I can backfill the DAG to run the BashOperators successfully. But there is always a warning like the one mentioned. I didn't verify if the scheduling is fine, but I doubt scheduling can work as the warning said the master scheduler doesn't know of my DAG's existence. How can I eliminate this warning and get scheduling work? Anybody run into the same issue who can help me out?

like image 393
bronzels Avatar asked Apr 23 '18 02:04

bronzels


People also ask

How to run DAGs using airflow scheduler?

To run the DAG, we need to start the Airflow scheduler by executing the below command: Airflow scheduler is the entity that actually executes the DAGs. By default, we use SequentialExecutor which executes tasks one by one. In case of more complex workflow, we can use other executors such as LocalExecutor or CeleryExecutor.

What is the difference between dag_1 and dag_2 in airflow?

While both DAG constructors get called when the file is accessed, only dag_1 is at the top level (in the globals () ), and so only it is added to Airflow. dag_2 is not loaded. When searching for DAGs inside the DAG_FOLDER, Airflow only considers Python files that contain the strings airflow and dag (case-insensitively) as an optimization.

What is a DAG in Salesforce airflow?

A DAG (Directed Acyclic Graph) is the core concept of Airflow, collecting Tasks together, organized with dependencies and relationships to say how they should run. Here's a basic example DAG: It defines four Tasks - A, B, C, and D - and dictates the order in which they have to run, and which tasks depend on what others.

What is a Dagrun in airflow?

A valid DAG can execute in an Airflow installation. Whenever, a DAG is triggered, a DAGRun is created. We can think of a DAGrun as an instance of the DAG with an execution timestamp. 2 – What are Nodes in a DAG? The next aspect to understand is the meaning of a Node in a DAG. A Node is nothing but an operator.


1 Answers

This is usually connected to the Scheduler not running or the refresh interval being too wide. There are no log entries present so we cannot analyze from there. Also, unfortunately the very cause might have been ignored, because this is usually the root of the problem:

I didn't verify if the scheduling is fine.

So first you should check if both of the following services are running:

airflow webserver

and

airflow scheduler

If that won't help, see this post for more reference: Airflow 1.9.0 is queuing but not launching tasks

like image 169
tobi6 Avatar answered Oct 21 '22 05:10

tobi6