Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does SQS really send multiple S3 PUT object records per message?

I've set up an S3 bucket to emit an event on PUT object to SQS, and I'm handling the SQS queue in an EB worker tier.

The schema for the message that SQS sends is here: http://docs.aws.amazon.com/AmazonS3/latest/dev/notification-content-structure.html

Records is an array, implying that there can be multiple records sent in one POST to my worker's endpoint. Does this actually happen? Or will my worker only ever receive one record per message?

The worker can only return one response, either 200 (message handled successfully) or non-200 (message not handled successfully, which puts it back into the queue), regardless of how many records in the message it receives.

So if my worker receives multiple records in a message, and it handles some successfully (say by doing something with side effects such as inserting into a database) but fails on one or more, how should I handle that? If I return 200, then the ones that failed will not be retried. But if I return non-200, then the ones that were handled successfully will be retried unnecessarily, and possibly re-inserted. So I'd have to make my worker smart enough to retry only the failed ones -- which is logic I'd prefer not having to write.

This would be much easier if only one record was ever sent per message. So if that's the case in practice, despite records being an array, I'd really like to know!

like image 643
sandinmyjoints Avatar asked Feb 12 '15 17:02

sandinmyjoints


1 Answers

To be clear, it's not the records that "SQS sends." It's the records that S3 sends to SQS (or to SNS, or to Lambda).

Currently, all S3 event notifications have a single event per notification message. We might include multiple records as we add new event types in the future. This is also a message format that is shared across other AWS services, and other services can include multiple records.

— https://forums.aws.amazon.com/thread.jspa?messageID=592264&#592264

So, for the moment, it appears there's only one record per message.

But... you are making a mistake if you assume your application need not be prepared to handle repeated or duplicate messages. In any massive and distributed system like SQS it is extremely difficult to absolutely guarantee that this can never happen, however unlikely:

Q: How many times will I receive each message?

Amazon SQS is engineered to provide “at least once” delivery of all messages in its queues. Although most of the time each message will be delivered to your application exactly once, you should design your system so that processing a message more than once does not create any errors or inconsistencies.

— http://aws.amazon.com/sqs/faqs/

Incidentally, in my platform, more than one entry in the records array is considered an error, causing the message to be abandoned and sent to the dead letter queue for review.

like image 89
Michael - sqlbot Avatar answered Nov 13 '22 07:11

Michael - sqlbot