Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Do we need directory structure logic for storing millions of images on Amazon S3/Cloudfront?

In order to support millions of potential images we have previously followed this sort of directory structure:

/profile/avatars/44/f2/47/48px/44f247d4e3f646c66d4d0337c6d415eb.jpg

The filename is md5 hashed, then we extract the first 6 characters in the string and build the folder structure from that.

So in the above example the filename:

44f247d4e3f646c66d4d0337c6d415eb.jpg

produces a directory structure of:

/44/f2/47/

We always did this in order to minimize the number of photos in any single directory, ultimately to aid filesystem performance.

However our new app is using Amazon S3 with Cloudfront

My understanding is that any folders you create on Amazon S3 are actually just references and are not directories on the filesystem.

If that is correct is it still recommended to split into folders/directories in the above, or similar method? Or can we simply remove this complexity in our application code and provide image links like so:

/profile/avatars/48px/filename.jpg

Baring in mind that this app is intended to serve 10's of millions of photos.

Any guidance would be greatly appreciated.

like image 757
gordyr Avatar asked Oct 23 '13 12:10

gordyr


People also ask

What is the structure of Amazon S3?

Amazon S3 has a flat structure instead of a hierarchy like you would see in a file system. However, for the sake of organizational simplicity, the Amazon S3 console supports the folder concept as a means of grouping objects. It does this by using a shared name prefix for objects (that is, objects have names that begin with a common string).

Does the Amazon S3 console support folder concept?

However, for the sake of organizational simplicity, the Amazon S3 console supports the folder concept as a means of grouping objects. It does this by using a shared name prefix for objects (that is, objects have names that begin with a common string). Object names are also referred to as key names.

How do I create a folder in Amazon S3?

When you use the Amazon S3 console to create a folder, Amazon S3 creates a 0-byte object with a key that's set to the folder name that you provided. For example, if you create a folder named photos in your bucket, the Amazon S3 console creates a 0-byte object with the key photos/. The console creates this object to support the idea of folders.

How does a client upload files to Amazon S3?

The client consumes an API where it basically tells the server that there’s a need to upload a file, but without sending the file itself. The server, with api keys, requests the same thing to S3. File name, type, size, and other metadata are sent to S3 in this phase.


2 Answers

Although S3 folders are basically only another way of writing the key name (as @E.J.Brennan already said in his answer), there are reasons to think about the naming structure of your "folders".

With your current number of photos and probably your access patterns, it might make sense to think about a way to speed up the S3 keyname lookups, making sure that operations on photos get spread out over multiple partitions. There is a great article on the AWS blog explaining all the details.

like image 102
j0nes Avatar answered Oct 12 '22 01:10

j0nes


You don't need to setup that structure on s3 unless you are doing it for your own convenience. All of the folders you create on s3 are really just an illusion for you, the files are stored in one big continuous container, so if you don't have a reason to organize the files in a pseudo-folder hierarchy, then don't bother.

If you needed to control access to different groups of people, based on you folder struture, that might be a reason to keep the structure, but besides that there probably isn't a benefit/

like image 36
E.J. Brennan Avatar answered Oct 12 '22 00:10

E.J. Brennan