In order to support millions of potential images we have previously followed this sort of directory structure:
/profile/avatars/44/f2/47/48px/44f247d4e3f646c66d4d0337c6d415eb.jpg
The filename is md5 hashed, then we extract the first 6 characters in the string and build the folder structure from that.
So in the above example the filename:
44f247d4e3f646c66d4d0337c6d415eb.jpg
produces a directory structure of:
/44/f2/47/
We always did this in order to minimize the number of photos in any single directory, ultimately to aid filesystem performance.
However our new app is using Amazon S3 with Cloudfront
My understanding is that any folders you create on Amazon S3 are actually just references and are not directories on the filesystem.
If that is correct is it still recommended to split into folders/directories in the above, or similar method? Or can we simply remove this complexity in our application code and provide image links like so:
/profile/avatars/48px/filename.jpg
Baring in mind that this app is intended to serve 10's of millions of photos.
Any guidance would be greatly appreciated.
Amazon S3 has a flat structure instead of a hierarchy like you would see in a file system. However, for the sake of organizational simplicity, the Amazon S3 console supports the folder concept as a means of grouping objects. It does this by using a shared name prefix for objects (that is, objects have names that begin with a common string).
However, for the sake of organizational simplicity, the Amazon S3 console supports the folder concept as a means of grouping objects. It does this by using a shared name prefix for objects (that is, objects have names that begin with a common string). Object names are also referred to as key names.
When you use the Amazon S3 console to create a folder, Amazon S3 creates a 0-byte object with a key that's set to the folder name that you provided. For example, if you create a folder named photos in your bucket, the Amazon S3 console creates a 0-byte object with the key photos/. The console creates this object to support the idea of folders.
The client consumes an API where it basically tells the server that there’s a need to upload a file, but without sending the file itself. The server, with api keys, requests the same thing to S3. File name, type, size, and other metadata are sent to S3 in this phase.
Although S3 folders are basically only another way of writing the key name (as @E.J.Brennan already said in his answer), there are reasons to think about the naming structure of your "folders".
With your current number of photos and probably your access patterns, it might make sense to think about a way to speed up the S3 keyname lookups, making sure that operations on photos get spread out over multiple partitions. There is a great article on the AWS blog explaining all the details.
You don't need to setup that structure on s3 unless you are doing it for your own convenience. All of the folders you create on s3 are really just an illusion for you, the files are stored in one big continuous container, so if you don't have a reason to organize the files in a pseudo-folder hierarchy, then don't bother.
If you needed to control access to different groups of people, based on you folder struture, that might be a reason to keep the structure, but besides that there probably isn't a benefit/
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With