Concurrency in Amazon S3

Tags:

I'm currently building a system where S3 will be used as a persistent hash-set (the S3 URL is inferred from the data) by lots of computers across the Internet. If two nodes store the same data then it will be stored using the same key and it will therefore not be stored twice. When an object is removed I need to know whether some other node(s) is using that data as well. In that case I will not remove it.

Right now I've implemented it by adding a list of the storing nodes as part of the data written to S3. So when a node is storing the data the following happens:

Read the object from S3.
Deserialize the object.
Add the new node's id to the list of storing nodes.
Serialize the new object (the data to store and the node-list).
Write the serialized data to S3.

This create a form of idempotent reference counting. Since requests over the Internet can be quite unreliable I don't want to just count the number of storing nodes. That's why I'm storing a list instead of a counter (in case a node sends the same request >1 times).

This approach works as long as two nodes are not writing simultaneously. S3 doesn't (as far as I know) provide any way to lock the object so that all these 5 steps become atomic.

How would you solve this concurrency issue? I'm considering implementing some form of optimistic concurrency. How should I do that for S3? Should I perhaps use a completely different approach?

499

asked Jun 08 '11 09:06

Yrlec

2 Answers

Consider first separating the lock list from your (protected) data. Create a separate bucket specific to your data to contain the lock list (bucket name should be a derivative of your data object name). Use individual files in that second bucket (one per node, with the object name derived from the node name). Nodes add a new object to the second bucket before accessing the protected data, nodes remove their object from the second bucket when they're finished.

This allows you to enumerate the second bucket to determine if your data is locked. And allows two nodes to update the lock list simultaneously without conflict.

100

answered Oct 07 '22 12:10

Tails

To add onto what amadeus said, if your needs aren't relational, you can even use AWS' SimpleDB, significantly cheaper.

answered Oct 07 '22 11:10

tim

Related questions
                            
                                How to escape path containing spaces
                            
                                Dynamic LINQ GroupBy Multiple Columns
                            
                                Using real world units instead of types
                            
                                Decompress string in c# which were compressed by php's gzcompress()
                            
                                Any tips to make working with Tuples easier in C#?
                            
                                Tell Resharper to use spring.net DI configuration for the inspection of unused classes?
                            
                                Code Contracts in C# and null checking
                            
                                Delay and de-duplication using Reactive Extensions (Rx)
                            
                                Acceleration/Deceleration Ratio equivalent with KeyFrame
                            
                                How can I determine that several file paths formats point to the same physical location [duplicate]
                            
                                Can an assembly that includes a non-CLS-compliant reference be CLS-compliant?
                            
                                C# is half as slow than Java in memory access with loops?
                            
                                Read Event Log Remotely with .NET
                            
                                Action delegate. How to get the instance that call the method
                            
                                Working with Cross Context Joins in LINQ-to-SQL
                            
                                NullReferenceException, no stack trace... where to start?
                            
                                Periodic Exception in WSDL Export Extension
                            
                                What is the datatype to store boolean value in MySQL? [duplicate]
                            
                                ConfigurationSection ConfigurationManager.GetSection() always returns null
                            
                                How to name columns for multi mapping support in Dapper?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Concurrency in Amazon S3

Tags:

c#

concurrency

amazon-s3

locking

distributed

Yrlec

People also ask

2 Answers

Tails

tim

Recent Activity

Donate For Us