Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Celery SQS + Duplication of tasks + SQS visibility timeout

Tags:

python

celery

Most of my Celery tasks have ETA longer then maximal visibility timeout defined by Amazon SQS.

Celery documentation says:

This causes problems with ETA/countdown/retry tasks where the time to execute exceeds the visibility timeout; in fact if that happens it will be executed again, and again in a loop.

So you have to increase the visibility timeout to match the time of the longest ETA you’re planning to use.

At the same time it also says that:

The maximum visibility timeout supported by AWS as of this writing is 12 hours (43200 seconds):

What should I do to avoid multiple execution of tasks in my workers if I am using SQS?

like image 316
Alexander Tyapkov Avatar asked Dec 23 '16 11:12

Alexander Tyapkov


People also ask

How can the visibility timeout value of a SQS message be modified?

For example, you have a message with a visibility timeout of 5 minutes. After 3 minutes, you call ChangeMessageVisibility with a timeout of 10 minutes. You can continue to call ChangeMessageVisibility to extend the visibility timeout to the maximum allowed time.

What happens after visibility timeout SQS?

If the visibility timeout is 0 seconds, the message must be deleted within the same millisecond it was sent, or it is considered abandoned. This can cause Amazon SQS to include duplicate messages in the same response to a ReceiveMessage operation if the MaxNumberOfMessages parameter is greater than 1.

What does the visibility timeout feature of Amazon SQS do?

The Visibility Timeout While a consumer is processing a message in the queue, SQS temporary hides the message from other consumers. This is done by setting a visibility timeout on the message, a period of time during which SQS prevents other consumers from receiving and processing the message.

What is the default setting for SQS visibility timeout?

Although the time priod is configurable, the default visibility timeout for a message is 30 seconds. The minimum is 0 seconds. The maximum is 12 hours.


1 Answers

Generally its not a good idea to have tasks with very long ETAs.

First of all, there is the "visibility_timeout" issue. And you probably dont want a very big visibility timeout because if the worker crashes 1 min before the task is about to run, then the Queue will still wait for the visibility_timeout to finish before sending the task to another worker and, I guess you dont want this to be another 1 month.

From celery docs:

Note that Celery will redeliver messages at worker shutdown, so having a long visibility timeout will only delay the redelivery of ‘lost’ tasks in the event of a power failure or forcefully terminated workers.

And also, SQS allows only so many tasks to be in the list to be ack'ed.

SQS calls these tasks as "Inflight Messages". From http://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-visibility-timeout.html:

A message is considered to be in flight after it's received from a queue by a consumer, but not yet deleted from the queue.

For standard queues, there can be a maximum of 120,000 inflight messages per queue. If you reach this limit, Amazon SQS returns the OverLimit error message. To avoid reaching the limit, you should delete messages from the queue after they're processed. You can also increase the number of queues you use to process your messages.

For FIFO queues, there can be a maximum of 20,000 inflight messages per queue. If you reach this limit, Amazon SQS returns no error messages.

I see two possible solutions, you can either use RabbitMQ instead, which doesnt rely on visibility timeouts (there are "RabbitMQ as a service" services if you dont want to manage your own) or change your code to have really small ETAs (best practice)

These are my 2 cents, maybe @asksol can provide some extra insights.

like image 140
giorgosp Avatar answered Oct 23 '22 12:10

giorgosp