While the post @ http://highscalability.com/amazon-architecture explains Amazon's architecture in general, I am interested in knowing how Amazon S3 is implemented.
Some of my guesses are
Is it be possible to implement something similar to this on a much smaller scale using scripting languages like Python or PHP?
The architecture of Amazon S3 is designed to be programming language-neutral, using AWS-supported interfaces to store and retrieve objects. You can access S3 and AWS programmatically by using the Amazon S3 REST API. The REST API is an HTTP interface to Amazon S3.
You can use Amazon S3 Block Public Access in Amazon Web Services China (Beijing) region, operated by Sinnet, and in Amazon Web Services China (Ningxia) region, operated by NWCD. Please visit the Amazon S3 Developer Guide to learn more about Amazon S3 Block Public Access.
S3 Standard is designed for 99.99% availability and Standard - IA is designed for 99.9% availability. Both are backed by the Amazon S3 Service Level Agreement.
Amazon S3 is a durable, secure, simple, and fast storage service, while Amazon S3 Glacier is used for archiving solutions. Use S3 if you need low latency or frequent access to your data. Use S3 Glacier for low storage cost, and you do not require millisecond access to your data.
Amazon S3 is implemented using the architecture described in the Dynamo Paper:
http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html
The paper explains consistent hashing, and how and why the guarantee is "eventual consistency".
The conflict resolution they talk about for Dynamo is not exposed to users of S3. It is used internally in Amazon's applications, but for S3, the only conflict resolution is last write wins.
Edit: Werner Vogels has said "Dynamo is not directly exposed externally as a web service; however, Dynamo and similar Amazon technologies are used to power parts of our Amazon Web Services, such as S3." http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html
I would emphasize that he isn't saying S3 and Dynamo share components, he explicitly says that Dynamo itself is one of the technologies that power S3. Everything I've seen from S3, including the caveats, is accounted for by assuming S3 is a fancy web services wrapper around Dynamo with authentication, accounting, and a last-write-wins conflict resolve that is invisible to the user.
The original question was about the underlying storage mechanism for S3. It is explicitly not a distributed file system like HDFS or a non-relational database like CouchDB. Dynamo fills this role.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With