Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Airflow: PythonOperator: why to include 'ds' arg?

Tags:

airflow

While defining a function to be later used as a python_callable, why is 'ds' included as the first arg of the function?

For example:

def python_func(ds, **kwargs):
    pass

I looked into the Airflow documentation, but could not find any explanation.

like image 200
subba Avatar asked Nov 10 '16 16:11

subba


People also ask

What is DS in Airflow?

On daily tasks, using ds (an Airflow Variable that allows you to specify execution date) makes sense because we need to process the data of the previous day.

What is Kwargs in Airflow?

op_kwargs (dict) – A dict of keyword arguments to pass to python_callable. provide_context (bool) – if set to true, Airflow will pass a set of keyword arguments that can be used in your function. This set of kwargs correspond exactly to what you can use in your jinja templates.


1 Answers

This is related to the provide_context=True parameter. As per Airflow documentation,

if set to true, Airflow will pass a set of keyword arguments that can be used in your function. This set of kwargs correspond exactly to what you can use in your jinja templates. For this to work, you need to define **kwargs in your function header.

ds is one of these keyword arguments and represents execution date in format "YYYY-MM-DD". For parameters that are marked as (templated) in the documentation, you can use '{{ ds }}' default variable to pass the execution date. You can read more about default variables here:

https://pythonhosted.org/airflow/code.html?highlight=pythonoperator#default-variables (obsolete)

https://airflow.incubator.apache.org/concepts.html?highlight=python_callable

PythonOperator doesn't have templated parameters, so doing something like

python_callable=print_execution_date('{{ ds }}')

won't work. To print execution date inside the callable function of your PythonOperator, you will have to do it as

def print_execution_date(ds, **kwargs):
    print(ds)

or

def print_execution_date(**kwargs):
    print(kwargs.get('ds'))

Hope this helps.

like image 71
Dmitri Safine Avatar answered Oct 19 '22 08:10

Dmitri Safine