As I'm using Apache Airflow, I can't seem to find why someone would create a CustomOperator
over a PythonOperator
. Wouldn't it lead to the same results if I'm using a python function inside a PythonOperator
instead of a CustomOperator
?
If someone would know what are the different use cases and best practices, that would be great! !
Thanks a lot for your help
The PythonOperator is an exception to the templating. It accepts a python_callable argument in which the runtime context may be applied, rather than the arguments that can be templated with the runtime context.
One can take a different approach by increasing the number of threads available on the machine that runs the scheduler process so that the max_threads parameter can be set to a higher value. With a higher value, the Airflow scheduler will be able to more effectively process the increased number of DAGs.
Apache Airflow is used for the scheduling and orchestration of data pipelines or workflows. Orchestration of data pipelines refers to the sequencing, coordination, scheduling, and managing complex data pipelines from diverse sources.
Both operators while similar are really at different abstraction levels, and depending on your use-case, one may be a better fit than another.
Code defined in a CustomOperator can be easily used by multiple DAGs. If you have a lot of DAGs that need to perform the same task it may make more sense to expose this code to the DAGs via a CustomOperator.
PythonOperator is very general and is a better fit for one-off DAG specific tasks.
At the end of the day the default set of operators provided in Airflow are just tools. Which tool you end up using (default operators) or whether it makes sense to create your own custom tool (custom operators) is a choice determined by a bunch of factors:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With