Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does Instagram use Amazon S3?

I'm in need of some insight into Instagram's engineering when uploading files to Amazon S3. I'm just starting with S3 and I think Instagram is a good model to follow because they upload thousands of images each day. My app is somewhat similar. Users upload images, can delete their own images, and all images are public.

In my project I'm creating objects with a folder prefix to organize uploads for each user. e.g. username/filename

My object URLs look like this:

https://s3.amazonaws.com/my_bucket/username/28c3d2c6ec098bd077d6b9cb5f13869d.jpg

but Instagram:

http://distilleryimage7.s3.amazonaws.com/f4947c1004ca11e2a0c81231380ff428_7.jpg

I'm guessing that distilleryimage7 is the bucket name. I'm not sure what advantage this type of URL has. I'm also guessing that Instagram doesn't use bucket "files" and stores all images in one bucket.

Please share any best practices in S3.

like image 733
CyberJunkie Avatar asked Sep 22 '12 19:09

CyberJunkie


People also ask

Does Instagram use AWS S3?

Picture-based social media service Instagram has been run entirely on AWS since its inception in 2010. It ran on cloud computing service Amazon EC2, which enabled it to build and run its own software without needing its own servers.

What AWS services does Instagram use?

Here at Instagram, we run our infrastructure on Amazon Web Services, running instances on their Elastic Compute Cloud (EC2).

How is Amazon S3 used?

Amazon Simple Storage Service (Amazon S3) is an object storage service that offers industry-leading scalability, data availability, security, and performance. You can use Amazon S3 to store and retrieve any amount of data at any time, from anywhere.

Why is Amazon S3 so popular?

Improved Performance. Amazon S3 allows admins to scale storage services dynamically according to demand. Amazon S3 is designed for higher data durability, as it automatically creates and stores copies of all S3 objects across multiple systems if that feature is enabled.


1 Answers

This URL is actually one that is supported by default by S3. For US and most buckets you can do a special DNS resolution which allows you to use either:

http://my_bucket.my_domain.com

With some changs to your own records or:

http://my_bucket.s3.amazonaws.com

If you don't want to change any of your A records (a small primer: http://docs.amazonwebservices.com/AmazonS3/latest/dev/VirtualHosting.html#VirtualHostingCustomURLs).

The advantages of this type of url is of course the common thought of using subodomains for certain assets to make loading faster in the browser.

Of course this is a fix. One used by other sites such as Facebook, Twitter and Youtube is to use a whole different domain for this kind of stuff. This helps since it is a stripped out domain specifically designed for these assets (no cookies should exist on these domains either).

So this isn't really a best practice of S3 but more of web development in general and covers a much wider view of how to program and layout a site in a production environment.

Yes Instagram would house all files in a huge single bucket, this is most likely the most sane method of doing this and then when you get big you would replicate parts of the buckets and split them across regions and sub regions dependant upon demand or ping them to cloudfront like Vimeo does.

Edit

After reading this further I realised too that Instagram does not house everything in one bucket. A bit weird really, especially since a bucket must be uniquely named across the whole of S3 including other peoples accounts. As such they probably don't use the username directly unless that bucket name hasn't already been taken.

There are huge benefits to doing this though. Like replication per user and cloudfront per user however there are also downsides:

  • A lot of separate http requests when many users images are shown, fair enough it is all to S3 domain but I am unsure how many subdomains you are allowed for SEO and browsers to take advantage of it (i think 6 in IE6).

  • Backup and replication can be harder since you would need to do per user not for a single bucket.

  • Moving buckets to cdn etc can be problematic since you again have to do it per user.

  • I think I remember seeing a max limit for buckets in S3 so I am unsure how this will scale effectively tbh.

like image 134
Sammaye Avatar answered Sep 23 '22 17:09

Sammaye