Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

configuring Airflow to work with CeleryExecutor

Tags:

airflow

I try to configure Airbnb AirFlow to use the CeleryExecutor like this:

I changed the executer in the airflow.cfg from SequentialExecutor to CeleryExecutor:

# The executor class that airflow should use. Choices include
# SequentialExecutor, LocalExecutor, CeleryExecutor
executor = CeleryExecutor

But I get the following error:

airflow.configuration.AirflowConfigException: error: cannot use sqlite with the CeleryExecutor

Note that the sql_alchemy_conn is configured like this:

sql_alchemy_conn = sqlite:////root/airflow/airflow.db

I looked at Airflow's GIT (https://github.com/airbnb/airflow/blob/master/airflow/configuration.py)

and found that the following code throws this exception:

def _validate(self):
        if (
                self.get("core", "executor") != 'SequentialExecutor' and
                "sqlite" in self.get('core', 'sql_alchemy_conn')):
            raise AirflowConfigException("error: cannot use sqlite with the {}".
                format(self.get('core', 'executor')))

It seems from this validate method that the sql_alchemy_conn cannot contain sqlite.

Do you have any idea how to configure the CeleryExecutor without sqllite? please note that I downloaded rabitMQ for working with the CeleryExecuter as required.

like image 220
Elad Eldor Avatar asked Apr 24 '16 11:04

Elad Eldor


People also ask

Which Executor is best for Airflow?

Airflow comes configured with the SequentialExecutor by default, which is a local executor, and the safest option for execution, but we strongly recommend you change this to LocalExecutor for small, single-machine installations, or one of the remote executors for a multi-machine/cloud installation.

How do you use local Executor in Airflow?

LocalExecutor runs tasks by spawning processes in a controlled fashion in different modes. Given that BaseExecutor has the option to receive a parallelism parameter to limit the number of process spawned, when this parameter is 0 the number of processes that LocalExecutor can spawn is unlimited.

How do I set up an Airflow cluster?

To set up an airflow cluster, we need to install below components and services: Airflow Webserver: A web interface to query the metadata to monitor and execute DAGs. Airflow Scheduler: It checks the status of the DAG's and tasks in the metadata database, create new ones if necessary, and sends the tasks to the queues.


2 Answers

It is said by AirFlow that the CeleryExecutor requires other backend than default database SQLite. You have to use MySQL or PostgreSQL, for example.

The sql_alchemy_conn in airflow.cfg must be changed to follow the SqlAlchemy connection string structure (see SqlAlchemy document)

For example,

sql_alchemy_conn = postgresql+psycopg2://airflow:[email protected]:5432/airflow
like image 76
Yu You Avatar answered Sep 20 '22 20:09

Yu You


To configure Airflow for mysql firstly install mysql this might help or just google it

  • goto airflow installation director usually /home//airflow
  • edit airflow.cfg
  • locate

    sql_alchemy_conn = sqlite:////home/vipul/airflow/airflow.db

and add # in front of it so it looks like

#sql_alchemy_conn = sqlite:////home/vipul/airflow/airflow.db 

if you have default sqlite

  • add this line below

    sql_alchemy_conn = mysql://:@localhost:3306/

  • save the file

  • run command

    airflow initdb

and done !

like image 29
vipul sharma Avatar answered Sep 19 '22 20:09

vipul sharma