In airflow, everything is supposed to be UTC (which is not affected by DST).
However, we have workflows that deliver things based on time zones that are affected by DST.
An example scenario:
Is there a way to schedule dags so they run at the correct time after a time change?
Off the top of my head:
If your machine is timezone-aware, set up your DAG to run at 8AM EST and 8AM EDT in UTC. Something like 0 11,12 * * *
. Have the first task a ShortCircuit operator. Then use something like pytz to localize the current time. If it is within your required time, continue (IE: run the DAG). Otherwise, return False. You'll have a tiny overhead 2 extra tasks per day, but the latency should be minimal as long as your machine isn't overloaded.
sloppy example:
from datetime import datetime
from pytz import utc, timezone
# ...
def is8AM(**kwargs):
ti = kwargs["ti"]
curtime = utc.localize(datetime.utcnow())
# If you want to use the exec date:
# curtime = utc.localize(ti.execution_date)
eastern = timezone('US/Eastern') # From docs, check your local names
loc_dt = curtime.astimezone(eastern)
if loc_dt.hour == 8:
return True
return False
start_task = ShortCircuitOperator(
task_id='check_for_8AM',
python_callable=is8AM,
provide_context=True,
dag=dag
)
Hope this is helpful
Edit: runtimes were wrong, subtracted instead of adding. Additionally, due to how runs are launched, you'll probably end up wanting to schedule for 7AM with an hourly schedule if you want them to run at 8.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With