Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Celery Redis instance filling up despite queue looking empty

We have a Django app that needs to fetch lots of data using Celery. There are 20 or so celery workers running every few minutes. We're running on Google Kubernetes Engine with a Redis queue using Cloud memorystore.

The Redis instance we're using for celery is filling up, even when the queue is empty according to Flower. This results in the Redis DB eventually being full and Celery throwing errors.

In Flower I see tasks coming in and out, and I have increased workers to the point where the queue is always empty now.

If I run redis-cli --bigkeys I see:


# Scanning the entire keyspace to find biggest keys as well as
# average sizes per key type.  You can use -i 0.1 to sleep 0.1 sec
# per 100 SCAN commands (not usually needed).

[00.00%] Biggest set    found so far '_kombu.binding.my-queue-name-queue' with 1 members
[00.00%] Biggest list   found so far 'default' with 611 items
[00.00%] Biggest list   found so far 'my-other-queue-name-queue' with 44705 items
[00.00%] Biggest set    found so far '_kombu.binding.celery.pidbox' with 19 members
[00.00%] Biggest list   found so far 'my-queue-name-queue' with 727179 items
[00.00%] Biggest set    found so far '_kombu.binding.celeryev' with 22 members

-------- summary -------

Sampled 12 keys in the keyspace!
Total key length in bytes is 271 (avg len 22.58)

Biggest   list found 'my-queue-name-queue' has 727179 items
Biggest    set found '_kombu.binding.celeryev' has 22 members

4 lists with 816144 items (33.33% of keys, avg size 204036.00)
0 hashs with 0 fields (00.00% of keys, avg size 0.00)
0 strings with 0 bytes (00.00% of keys, avg size 0.00)
0 streams with 0 entries (00.00% of keys, avg size 0.00)
8 sets with 47 members (66.67% of keys, avg size 5.88)
0 zsets with 0 members (00.00% of keys, avg size 0.00)

If I inspect the queue using LRANGE I see lots of objects like this:

"{\"body\": \"W1syNDQ0NF0sIHsicmVmZXJlbmNlX3RpbWUiOiBudWxsLCAibGF0ZXN0X3RpbWUiOiBudWxsLCAicm9sbGluZyI6IGZhbHNlLCAidGltZWZyYW1lIjogIjFkIiwgIl9udW1fcmV0cmllcyI6IDF9LCB7ImNhbGxiYWNrcyI6IG51bGwsICJlcnJiYWNrcyI6IG51bGwsICJjaGFpbiI6IG51bGwsICJjaG9yZCI6IG51bGx9XQ==\", \"content-encoding\": \"utf-8\", \"content-type\": \"application/json\", \"headers\": {\"lang\": \"py\", \"task\": \"MyDataCollectorClass\", \"id\": \"646910fc-f9db-48c3-b5a9-13febbc00bde\", \"shadow\": null, \"eta\": \"2019-08-20T02:31:05.113875+00:00\", \"expires\": null, \"group\": null, \"retries\": 0, \"timelimit\": [null, null], \"root_id\": \"beeff557-66be-451d-9c0c-dc622ca94493\", \"parent_id\": \"374d8e3e-92b5-423e-be58-e043999a1722\", \"argsrepr\": \"(24444,)\", \"kwargsrepr\": \"{'reference_time': None, 'latest_time': None, 'rolling': False, 'timeframe': '1d', '_num_retries': 1}\", \"origin\": \"gen1@celery-my-queue-name-worker-6595bd8fd8-8vgzq\"}, \"properties\": {\"correlation_id\": \"646910fc-f9db-48c3-b5a9-13febbc00bde\", \"reply_to\": \"e55a31ed-cbba-3d79-9ffc-c19a29e77aac\", \"delivery_mode\": 2, \"delivery_info\": {\"exchange\": \"\", \"routing_key\": \"my-queue-name-queue\"}, \"priority\": 0, \"body_encoding\": \"base64\", \"delivery_tag\": \"a83074a5-8787-49e3-bb7d-a0e69ba7f599\"}}"

We're using django-celery-results to store results, so these shouldn't be going in there, and we're using a separate Redis instance for Django's cache.

If I clear Redis with a FLUSHALL it slowly fills up again.

I'm kind of stumped at where to go next. I don't know Redis well - maybe I can do something to inspect the data to see what's filling this? Maybe it's Flower not reporting properly? Maybe Celery keeps completed tasks for a bit despite us using the Django DB for results?

Thanks loads for any help.

like image 239
Ludo Avatar asked Aug 20 '19 10:08

Ludo


People also ask

How many tasks can Celery Redis handle?

celery beats only trigger those 1000 tasks (by the crontab schedule), not run them.

How does Celery work with Redis?

Celery is a task queue with focus on real-time processing, while also supporting task scheduling. Redis is a message broker. This means it handles the queue of "messages" between Django and Celery. Django is a web framework made for perfectionists with deadlines.

Can I use Celery without Redis?

The answer is - no, you cannot use Celery without a broker (Redis, RabbitMQ, or any other from the list of supported brokers).

What is the difference between Celery and Redis?

Celery belongs to "Message Queue" category of the tech stack, while Redis can be primarily classified under "In-Memory Databases". "Task queue" is the primary reason why developers consider Celery over the competitors, whereas "Performance" was stated as the key factor in picking Redis.


1 Answers

It sounds like Redis is not set up to delete completed items or report & delete failed items--i.e. it may be putting the tasks on the list, but it's not taking them off.

Check out pypi packages: rq, django-rq, django-rq-scheduler

You can read here a little bit about how this should work: https://python-rq.org/docs/

like image 82
jojo Avatar answered Nov 15 '22 00:11

jojo