Airflow Why the scheduler doesn't start my DAG?

Multi tool use
Airflow Why the scheduler doesn't start my DAG?
I have the following Dag:
The first Dag with 0 1 * * *
ran without any problem. The end DAG 0 10 1 * *
Did not run.
When I do:
0 1 * * *
0 10 1 * *
import datetime
print datetime.datetime.now()
I get:
2018-07-01 12:14:15.632812
So I don't understand why this DAG hasn't been scheduled. I understand that it's not mandatory to run exactly at 10:00 but the stat should be Running
.
Running
According to the "Latest Run" of the first task with is 2018-06-30 01:00
I suspect that I don't actually understand Airflow clock. From my point of view the last run was on 2018-07-01 01:00
Because it ran today morning not yesterday.
2018-06-30 01:00
2018-07-01 01:00
Edit:
I saw this paragraph at the documntation:
"Note that if you run a DAG on a schedule_interval of one day, the run stamped 2016-01-01 will be trigger soon after 2016-01-01T23:59. In other words, the job instance is started once the period it covers has ended."
So I'm wondering.. I should schedule everything to one day before the actual date I want?
So If I actually want something to run at 0 10 1 * *
I should schedule it to 0 10 30 * *
? In other words if I want something to run on the 1st of each month at 10:00 I should schedule it to the last day of each month at 10:00 ?
0 10 1 * *
0 10 30 * *
Where is the logic in that? This is very hard to understand and follow.
It gets worst, According to this There is no way to tell the scheduler this input. What am I to do?!
1 Answer
1
Airflow schedules tasks to run at the END of a schedule interval. This can be a little counter intuitive, but is based around the idea that the data for a particular interval isn't available until that interval is over.
Suppose you had a workflow that is supposed to run every day. You can't get all of the data for yesterday until that day is over (today).
In your case, it would make sense that the first DAG's last run is for yesterday, since that was the "execution_date" associated for that DagRun - your DAG ran today ** for yesterday's data.**
If you want your DAG to run on the 1st of every month, than changing the schedule isn't a bad idea. However, if you want your DAG to run for the data associated for the 1st of every month (i.e. pass that date into an API request or a SQL query), then you have it right.
Sorry but this does not answer my question. I'm aware of Airflow features. My code needs to run on 1st day of every month at 10 AM. I do not pass any date argument it simply a function that needs to run. Airflow doesn't allow me to do that. There is no cron expressions that can say last day of a month. You need to build 3 different cron expressions for that.
– jack
Jul 2 at 5:21
@jack I think you're misunderstanding the "schedule interval" If your start date is
2018-06-01
and your schedule interval is 0 10 1 * *
Then on 2018-07-01 T 00:10:00
your execution for 2018-06-01 T 00:10:00
will start running. crontab.guru/#0_10_1__– dlamblin
Jul 2 at 9:47
2018-06-01
0 10 1 * *
2018-07-01 T 00:10:00
2018-06-01 T 00:10:00
@dlamblin I'm not sure I understand. My start date is: 'start_date': datetime(2018, 06, 21) This is the date I lunched the DAG.
– jack
Jul 2 at 11:11
So if you had your
start_date
as late June and have the interval set to run every month, it will run on the first day of July, as that's when the interval (monthly, starting in June) will have finished.– Viraj Parekh
Jul 2 at 19:28
start_date
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
What is your start date jack?
– dlamblin
Jul 2 at 9:44