Command:
python dag.py backfill -i -t task1 --pool backfill -s "2016-05-29 03:00:00" -e "2016-06-07 00:00:00"
All the tasks get queue and all start running. Max capacity is essentially ignored.
From what I know, pool oversubscription is supposed to be a known issue in 1.7.1.3 (latest stable release). Further, the Airflow backfill job runner doesn't respect pool constraints - only the Scheduler does and the scheduler doesn't schedule/deal with backfills. I think these are supposed to change in the next version - not sure though.
Under the current release, 1.7.1.3
backfilling is, in my experience, almost always a bad idea. The scheduler can end up fighting with the backfill job, the backfilled DAG can enter odd states, and generally leave things in a smoking ruin.
Generally, I've found more success by making sure my jobs can distribute well across workers and finish in a reasonable time and trusting in the scheduler and the task start_date to carry the task through to completion.
This above does end up with some pretty horrible over-subscription of the number of DAG runs... and the scheduler tends to choke when it is past the configuration limit. The solution: bump the configuration limit for DAG runs temporarily. The scheduler and executor will tend to work reasonably well together to make sure you don't actually end up running too many jobs at the same time.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With