Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Creating separate database connection for every celery worker

I keep running into wierd mysql issues while workers executing tasks just after creation.

We use django 1.3, celery 3.1.17, djorm-ext-pool 0.5

We start celery process with concurrency 3. My obeservation so far is, when the workers process start, they all get same mysql connecition. We log db connection id as below.

from django.db import connection
connection.cursor()
logger.info("Task %s processing with db connection %s", str(task_id), str(connection.connection.thread_id()))

When all the workers get tasks, the first one executes successfully but the other two gives weird Mysql errors. It either errors with "Mysql server gone away", or with a condition where Django throws "DoesNotExist" error. clearly the objects that Django is querying do exist.

After this error, each worker starts getting its own database connection after which we don't find any issue.

What is the default behavior of celery ? Is it designed to share same database connection. If so how is the inter process communication handled ? I would ideally prefer different database connection for each worker.

I tried the code mentioned in below link which did not work. Celery Worker Database Connection Pooling

We have also fixed the celery code suggested below. https://github.com/celery/celery/issues/2453

For those who downvote the question, kindly let me know the reason for downvote.

like image 768
Venkat Kotra Avatar asked Mar 17 '16 12:03

Venkat Kotra


People also ask

Which databases are supported by celery?

Additionally, MongoDB, Amazon SQS, CouchDB, IronMQ, and databases (using SQLAlchemy or the Django ORM) are supported in status experimental.

How many connections can a database have?

By default, SQL Server allows a maximum of 32767 concurrent connections which is the maximum number of users that can simultaneously log in to the SQL server instance.

Is celery multi threaded?

Celery supports two thread-based execution pools: eventlet and gevent. Here, the execution pool runs in the same process as the Celery worker itself. To be precise, both eventlet and gevent use greenlets and not threads.

Is celery single threaded?

celery -A tasks worker --pool=solo --loglevel=info Worker picks up tasks from the Queue and starts running in the thread. As we have only one thread here. It cannot pick another task until existing task is completed.


1 Answers

Celery is started with below command

celery -A myproject worker --loglevel=debug --concurrency=3 -Q testqueue

myproject.py as part of the master process was making some queries to mysql database before forking the worker processes.

As part of query flow in main process, django ORM creates a sqlalchemy connection pool if it does not already exist. Worker processes are then created.

Celery as part of django fixups closes existing connections.

    def close_database(self, **kwargs):
    if self._close_old_connections:
        return self._close_old_connections()  # Django 1.6
    if not self.db_reuse_max:
        return self._close_database()
    if self._db_recycles >= self.db_reuse_max * 2:
        self._db_recycles = 0
        self._close_database()
    self._db_recycles += 1

In effect what could be happening is that, the sqlalchemy pool object with one unused db connection gets copied to the 3 worker process when forked. So the 3 different pools have 3 connection objects pointing to the same connection file descriptor.

Workers while executing the tasks when asked for a db connection, all the workers get the same unused connection from sqlalchemy pool because that is currently unused. The fact that all the connections point to the same file descriptor has caused the MySQL connection gone away errors.

New connections created there after are all new and don't point to the same socket file descriptor.

Solution:

In the main process add

from django.db import connection
connection.cursor()

before any import is done. i.e before even djorm-ext-pool module is added.

That way all the db queries will use connection created by django outside the pool. When celery django fixup closes the connection, the connection actually gets closed as opposed to going back to the alchemy pool leaving the alchemy pool with no connections in it at the time of coping over to all the workers when forked. There after when workers ask for db connection, sqlalchemy returns one of the newly created connections.

like image 109
Venkat Kotra Avatar answered Sep 22 '22 19:09

Venkat Kotra