Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Amazon S3 architecture [closed]

While the post @ http://highscalability.com/amazon-architecture explains Amazon's architecture in general, I am interested in knowing how Amazon S3 is implemented.

Some of my guesses are

  1. A distributed file system like HDFS http://hadoop.apache.org/core/docs/current/hdfs_design.html
  2. A non relational persistent DB like CouchDB http://couchdb.apache.org/

Is it be possible to implement something similar to this on a much smaller scale using scripting languages like Python or PHP?

like image 275
Sukumar Avatar asked Feb 19 '09 06:02

Sukumar


People also ask

What is the architecture of AWS S3?

The architecture of Amazon S3 is designed to be programming language-neutral, using AWS-supported interfaces to store and retrieve objects. You can access S3 and AWS programmatically by using the Amazon S3 REST API. The REST API is an HTTP interface to Amazon S3.

Is AWS S3 blocked in China?

You can use Amazon S3 Block Public Access in Amazon Web Services China (Beijing) region, operated by Sinnet, and in Amazon Web Services China (Ningxia) region, operated by NWCD. Please visit the Amazon S3 Developer Guide to learn more about Amazon S3 Block Public Access.

Is S3 bucket High Availability?

S3 Standard is designed for 99.99% availability and Standard - IA is designed for 99.9% availability. Both are backed by the Amazon S3 Service Level Agreement.

What is the difference between S3 and Glacier?

Amazon S3 is a durable, secure, simple, and fast storage service, while Amazon S3 Glacier is used for archiving solutions. Use S3 if you need low latency or frequent access to your data. Use S3 Glacier for low storage cost, and you do not require millisecond access to your data.


1 Answers

Amazon S3 is implemented using the architecture described in the Dynamo Paper:

http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html

The paper explains consistent hashing, and how and why the guarantee is "eventual consistency".

The conflict resolution they talk about for Dynamo is not exposed to users of S3. It is used internally in Amazon's applications, but for S3, the only conflict resolution is last write wins.

Edit: Werner Vogels has said "Dynamo is not directly exposed externally as a web service; however, Dynamo and similar Amazon technologies are used to power parts of our Amazon Web Services, such as S3." http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html

I would emphasize that he isn't saying S3 and Dynamo share components, he explicitly says that Dynamo itself is one of the technologies that power S3. Everything I've seen from S3, including the caveats, is accounted for by assuming S3 is a fancy web services wrapper around Dynamo with authentication, accounting, and a last-write-wins conflict resolve that is invisible to the user.

The original question was about the underlying storage mechanism for S3. It is explicitly not a distributed file system like HDFS or a non-relational database like CouchDB. Dynamo fills this role.

like image 89
Kevin Peterson Avatar answered Oct 09 '22 12:10

Kevin Peterson