Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why use CustomOperator over PythonOperator in Apache Airflow?

Tags:

airflow

As I'm using Apache Airflow, I can't seem to find why someone would create a CustomOperator over a PythonOperator. Wouldn't it lead to the same results if I'm using a python function inside a PythonOperator instead of a CustomOperator?

If someone would know what are the different use cases and best practices, that would be great! !

Thanks a lot for your help

like image 891
Antoine Krajnc Avatar asked Jan 04 '20 15:01

Antoine Krajnc


People also ask

What is PythonOperator in Airflow?

The PythonOperator is an exception to the templating. It accepts a python_callable argument in which the runtime context may be applied, rather than the arguments that can be templated with the runtime context.

How can you improve Airflow performance?

One can take a different approach by increasing the number of threads available on the machine that runs the scheduler process so that the max_threads parameter can be set to a higher value. With a higher value, the Airflow scheduler will be able to more effectively process the increased number of DAGs.

For which use Apache Airflow is best suited?

Apache Airflow is used for the scheduling and orchestration of data pipelines or workflows. Orchestration of data pipelines refers to the sequencing, coordination, scheduling, and managing complex data pipelines from diverse sources.


1 Answers

Both operators while similar are really at different abstraction levels, and depending on your use-case, one may be a better fit than another.

Code defined in a CustomOperator can be easily used by multiple DAGs. If you have a lot of DAGs that need to perform the same task it may make more sense to expose this code to the DAGs via a CustomOperator.

PythonOperator is very general and is a better fit for one-off DAG specific tasks.

At the end of the day the default set of operators provided in Airflow are just tools. Which tool you end up using (default operators) or whether it makes sense to create your own custom tool (custom operators) is a choice determined by a bunch of factors:

  1. The type of task you are trying to accomplish.
  2. Code organization requirements necessitated by policy or the number of people maintaining the pipeline.
  3. Simple personal taste.
like image 164
Victor Avatar answered Sep 27 '22 19:09

Victor