I understand that to bring vast scalability and reliability, SQS does extensive parallelization of resources. It uses redundant servers for even small queues and even the messages posted to the queues are stored redundantly as multiple copies. These are the factors which prevent it from exactly-once-delivery like in RabbitMQ. I have seen even deleted messages being delivered.
The implications for the developers is that they need to be prepared for multiple delivery of messages. Amazon claims it not to be a problem, but it it is, then the developer must use some synchronization construct like a database-transaction-lock or dynamo-db conditional write. both of these reduce scalability.
In light of the duplicate delivery problem, how the message-invisible-period feature holds? The message is not guaranteed to be invisible. If the developer has to make own arrangements for synchronization, what benefit is of the invisibility-period. I have seen messages re-delivered even when they were supposed to be invisible.
here i include some references
A single Amazon SQS message queue can contain an unlimited number of messages. However, there is a quota of 120,000 for the number of inflight messages for a standard queue and 20,000 for a FIFO queue.
Amazon SQS has no transaction support, so messages might therefore be retrieved twice. Application have to be written in an idempotent way so that they can receive a message twice. Amazon SQS has a maximum message size of 256kb per message, so bigger messages will fail to be sent.
To send messages larger than 256 KB, you can use the Amazon SQS Extended Client Library for Java . This library allows you to send an Amazon SQS message that contains a reference to a message payload in Amazon S3. The maximum payload size is 2 GB. The default visibility timeout for a message is 30 seconds.
Amazon SQS is engineered to provide “at least once” delivery of all messages in its queues. Although most of the time each message will be delivered to your application exactly once, you should design your system so that processing a message more than once does not create any errors or inconsistencies.
Message invisibility solves a different problem to guaranteeing one and only one delivery. Consider a long running operation on an item in the queue. If the processor craps out during the operation, you don't want to delete the message, you want it to reappear and be handled again by a different processor.
So the pattern is...
So whether you get duplicate delivery or not, you still need to ensure that you process the item in the queue. If you delete it on pulling it off the queue, and then your server dies, you may lose that message forever. It enables aggressive scaling through the use of spot instances - and guarantees (using the above pattern), that you won't lose a message.
But - it doesn't guarantee once and only once delivery. But I don't think it's designed for that problem. I also don't think it's an insurmountable problem. In our case (and I can see why I've never noticed the issues before) - we're writing results to S3. It's no big deal if it overwrites the same file with the same data. Of course if it's a debit transaction going to a bank a/c, you'd probably want some sort of correlation ID... and most systems already have those in there. So if you get a duplicate correlation value, you throw an exception and move on.
Good question. Highlighted something for me.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With