I have the following Dag:
The first Dag with 0 1 * * *
ran without any problem. The end DAG 0 10 1 * *
Did not run.
When I do:
import datetime
print datetime.datetime.now()
I get:
2018-07-01 12:14:15.632812
So I don't understand why this DAG hasn't been scheduled. I understand that it's not mandatory to run exactly at 10:00 but the stat should be Running
.
According to the "Latest Run" of the first task with is 2018-06-30 01:00
I suspect that I don't actually understand Airflow clock. From my point of view the last run was on 2018-07-01 01:00
Because it ran today morning not yesterday.
Edit: I saw this paragraph at the documntation:
"Note that if you run a DAG on a schedule_interval of one day, the run stamped 2016-01-01 will be trigger soon after 2016-01-01T23:59. In other words, the job instance is started once the period it covers has ended."
So I'm wondering.. I should schedule everything to one day before the actual date I want?
So If I actually want something to run at 0 10 1 * *
I should schedule it to 0 10 30 * *
? In other words if I want something to run on the 1st of each month at 10:00 I should schedule it to the last day of each month at 10:00 ?
Where is the logic in that? This is very hard to understand and follow.
It gets worst, According to this There is no way to tell the scheduler this input. What am I to do?!
The Airflow scheduler is designed to run as a persistent service in an Airflow production environment. To kick it off, all you need to do is execute the airflow scheduler command. It uses the configuration specified in airflow. cfg .
To check the health status of your Airflow instance, you can simply access the endpoint "/health" . It will return a JSON object in which a high-level glance is provided. The status of each component can be either “healthy” or “unhealthy”.
Run airflow dags list with the Airflow CLI to make sure that Airflow has registered the DAG in the metastore. If the DAG appears in the list, try restarting the webserver. Try restarting the scheduler (if you are using the Astro CLI, run astro dev stop && astro dev start ).
Airflow schedules tasks to run at the END of a schedule interval. This can be a little counter intuitive, but is based around the idea that the data for a particular interval isn't available until that interval is over.
Suppose you had a workflow that is supposed to run every day. You can't get all of the data for yesterday until that day is over (today).
In your case, it would make sense that the first DAG's last run is for yesterday, since that was the "execution_date" associated for that DagRun - your DAG ran today for yesterday's data.
If you want your DAG to run on the 1st of every month, than changing the schedule isn't a bad idea. However, if you want your DAG to run for the data associated for the 1st of every month (i.e. pass that date into an API request or a SQL query), then you have it right.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With