I'm in need of some insight into Instagram's engineering when uploading files to Amazon S3. I'm just starting with S3 and I think Instagram is a good model to follow because they upload thousands of images each day. My app is somewhat similar. Users upload images, can delete their own images, and all images are public.
In my project I'm creating objects with a folder prefix to organize uploads for each user. e.g. username/filename
My object URLs look like this:
https://s3.amazonaws.com/my_bucket/username/28c3d2c6ec098bd077d6b9cb5f13869d.jpg
but Instagram:
http://distilleryimage7.s3.amazonaws.com/f4947c1004ca11e2a0c81231380ff428_7.jpg
I'm guessing that distilleryimage7
is the bucket name. I'm not sure what advantage this type of URL has. I'm also guessing that Instagram doesn't use bucket "files" and stores all images in one bucket.
Please share any best practices in S3.
Picture-based social media service Instagram has been run entirely on AWS since its inception in 2010. It ran on cloud computing service Amazon EC2, which enabled it to build and run its own software without needing its own servers.
Here at Instagram, we run our infrastructure on Amazon Web Services, running instances on their Elastic Compute Cloud (EC2).
Amazon Simple Storage Service (Amazon S3) is an object storage service that offers industry-leading scalability, data availability, security, and performance. You can use Amazon S3 to store and retrieve any amount of data at any time, from anywhere.
Improved Performance. Amazon S3 allows admins to scale storage services dynamically according to demand. Amazon S3 is designed for higher data durability, as it automatically creates and stores copies of all S3 objects across multiple systems if that feature is enabled.
This URL is actually one that is supported by default by S3. For US and most buckets you can do a special DNS resolution which allows you to use either:
http://my_bucket.my_domain.com
With some changs to your own records or:
http://my_bucket.s3.amazonaws.com
If you don't want to change any of your A records (a small primer: http://docs.amazonwebservices.com/AmazonS3/latest/dev/VirtualHosting.html#VirtualHostingCustomURLs).
The advantages of this type of url is of course the common thought of using subodomains for certain assets to make loading faster in the browser.
Of course this is a fix. One used by other sites such as Facebook, Twitter and Youtube is to use a whole different domain for this kind of stuff. This helps since it is a stripped out domain specifically designed for these assets (no cookies should exist on these domains either).
So this isn't really a best practice of S3 but more of web development in general and covers a much wider view of how to program and layout a site in a production environment.
Yes Instagram would house all files in a huge single bucket, this is most likely the most sane method of doing this and then when you get big you would replicate parts of the buckets and split them across regions and sub regions dependant upon demand or ping them to cloudfront like Vimeo does.
After reading this further I realised too that Instagram does not house everything in one bucket. A bit weird really, especially since a bucket must be uniquely named across the whole of S3 including other peoples accounts. As such they probably don't use the username directly unless that bucket name hasn't already been taken.
There are huge benefits to doing this though. Like replication per user and cloudfront per user however there are also downsides:
A lot of separate http requests when many users images are shown, fair enough it is all to S3 domain but I am unsure how many subdomains you are allowed for SEO and browsers to take advantage of it (i think 6 in IE6).
Backup and replication can be harder since you would need to do per user not for a single bucket.
Moving buckets to cdn etc can be problematic since you again have to do it per user.
I think I remember seeing a max limit for buckets in S3 so I am unsure how this will scale effectively tbh.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With