The problem my team and I have been trying to solve involves multiple ec2 instances each with their own independent, parallel access to the same S3 bucket. The issue arises as a race condition when each client is attempting to download the same file within the aforementioned s3 bucket. Each client is attempting to read the file, run some business logic and then delete the file. Since there are many opportunities for delay, the race condition occurs and multiple instances end up running the business logic.
Some advice would be greatly appreciated on how engineers have been implementing locking mechanisms with their s3 clients.
Our brainstormed approach: Upload a .lock file to the s3 bucket with information regarding which instance currently holds the lock. When the instance that holds the lock finishes the process, it then deletes its lock. (issues arise when the lock file is being uploaded - race condition with the locking mechanism).
hmmm... you're going to have a race condition with the lock file now... multiple nodes are going to upload the same lock file!
So you'll need something a little more sophisticated as S3 does not have any concurrency built in and this can be quite inconvenient.
The obvious way to deal with this is to use SQS (simple queue service) - this is built for concurrency.
So in your case, all of the nodes connect to the same queue waiting for work from the queue. Something or other will add elements to the queue for each file in s3 that needs to be processed. One of the nodes will pick up the entry in the queue, process the file, delete the file and delete the entry in the queue.
That way you don't get multi processing and you get elegant scaling etc.
The outstanding issue however is what is scanning s3 in the first place to put work on the queue. This is probably where your difficulty will arise.
I think you have two options:
Use a lambda. This is rather elegant. You can configure a lambda to fire when something gets added to S3. This lambda will then register a pointer to the file on the queue to be picked up for the ec2 instances to process.
Problem with the lambda is your application is a little more distributed. i.e. you can't just look in the code for the behaviour, you've got to look in lambda as well. Though I guess this lambda is not particularly heavyweight.
Let all the ec2 instances monitor s3 but when they find work to do they'll add the work to the FIFO queue. This is a relatively new queue type from AWS where you have guaranteed order and you have exactly once processing. Thus you can guarantee that even though multiple nodes found the same s3 file, only one node will process it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With