I was wondering if there is any memory size limit for an XCOM
in airflow ?
XComs (short for “cross-communications”) are a mechanism that let Tasks talk to each other, as by default Tasks are entirely isolated and may be running on entirely different machines. An XCom is identified by a key (essentially its name), as well as the task_id and dag_id it came from.
Pulling a XCom with xcom_pull In order to pull a XCom from a task, you have to use the xcom_pull method. Like xcom_push, this method is available through a task instance object. xcom_pull expects 2 arguments: task_ids, only XComs from tasks matching ids will be pulled.
Airflow is NOT a processing framework. It is not Spark, neither Flink. Airflow is an orchestrator, and it the best orchestrator. There is no optimisations to process big data in Airflow neither a way to distribute it (maybe with one executor, but this is another topic). If you try to exchange big data between your tasks, you will end up with a memory overflow error! Oh, and do you know the xcom limit size in Airflow?
It depends on the database you use:
Yes, 64 Kilobytes for MySQL! Again, use XComs only for sharing small amount of data.
ref: https://marclamberti.com/blog/airflow-xcom/
After looking at the source code it looks there is none, the type is a large binary in SQLAlchemy. Code So according to the documentation is an unlengthed binary type for the target platform, such as BLOB on MySQL and BYTEA for PostgreSQL.
According to the source code check this source code link, maximum XCOM size is 48KB.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With