If I had a million images, would it be better to store them in some folder/sub-folder hierarchy or just dump them all straight into a bucket (without any folders)?
Would dumping all the images into a hierarchy-less bucket slow down LIST operations?
Is there a significant overhead in creating folders and sub folders on the fly and setting up their ACLs (programatically speaking)?
S3 provides unlimited scalability, and there is no official limit on the amount of data and number of objects you can store in an S3 bucket. The size limit for objects stored in a bucket is 5 TB.
Bucket policies are limited to 20 KB in size. You can use the AWS Policy Generator to create a bucket policy for your Amazon S3 bucket.
You can upload any number of objects to the bucket. By default, you can create up to 100 buckets in each of your AWS accounts.
In Amazon S3, folders are used to group objects and organize files. Unlike a traditional file system, Amazon S3 doesn't use hierarchy to organize its objects and files. Amazon S3 console supports the folder concept only as a means of grouping (and displaying) objects.
S3 doesn't respect hierarchical namespaces. Each bucket simply contains a number of mappings from key to object (along with associated metadata, ACLs and so on).
Even though your object's key might contain a '/', S3 treats the path as a plain string and puts all objects in a flat namespace.
In my experience, LIST operations do take (linearly) longer as object count increases, but this is probably a symptom of the increased I/O required on the Amazon servers, and down the wire to your client.
However, lookup times do not seem to increase with object count - it's most probably some sort of O(1) hashtable implementation on their end - so having many objects in the same bucket should be just as performant as small buckets for normal usage (i.e. not LISTs).
As for the ACL, grants can be set on the bucket and on each individual object. As there is no hierarchy, they're your only two options. Obviously, setting as many bucket-wide grants will massively reduce your admin headaches if you have millions of files, but remember you can only grant permissions, not revoke them, so the bucket-wide grants should be the maximal subset of the ACL for all its contents.
I'd recommend splitting into separate buckets for:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With