Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Amazon AWS S3 file naming strategy for performance

I am currently utilizing the following naming scheme:

/#{bucket_name}/#{customer_name}/fi/le/na/filename.jpg

So an image for a file named dsca007.jpg for customer bent would be stored here:

/images/bent/ds/ca/00/dsca007.jpg

But I can understand that S3 wants a different form like this:

enter image description here

Source: http://docs.aws.amazon.com/AmazonS3/latest/dev/request-rate-perf-considerations.html

Source: https://www.youtube.com/watch?v=uXHw0Xae2ww#t=535

I understand that they want me to change my naming scheme to something like:

/#{bucket_name}/fi/le/na/#{customer_name})filename.jpg

But then he shows this slide in the youtube video:

enter image description here

Does that mean that my first approach was fine?

I really liked the first structure better because I was able to get a list of files owned by one customer, is there a way to list all files under a customer with the last mentioned naming scheme?

It was also quite easy to figure out how much storage each customer was using by running this command:

s3cmd du s3://images/#{customer_name}

How can I do that with the new naming scheme?

like image 303
Kasper Grubbe Avatar asked Jan 30 '14 16:01

Kasper Grubbe


People also ask

What is the best way to get better performance for storing several files in S3?

Although S3 bucket names are globally unique, each bucket is stored in a Region that you select when you create the bucket. To optimize performance, we recommend that you access the bucket from Amazon EC2 instances in the same AWS Region when possible. This helps reduce network latency and data transfer costs.

What should be considered while naming an S3 bucket?

The following rules apply for naming buckets in Amazon S3: Bucket names must be between 3 (min) and 63 (max) characters long. Bucket names can consist only of lowercase letters, numbers, dots (.), and hyphens (-). Bucket names must begin and end with a letter or number.

When naming an Amazon S3 bucket a developer must choose something that is unique for which portions of AWS must the bucket name be unique?

Amazon S3 supports global buckets, which means that each bucket name must be unique across all AWS accounts in all the AWS Regions within a partition. A partition is a grouping of Regions. AWS currently has three partitions: aws (Standard Regions), aws-cn (China Regions), and aws-us-gov (AWS GovCloud (US)).

Does S3 replace files with same name?

By default, when you upload the file with same name. It will overwrite the existing file. In case you want to have the previous file available, you need to enable versioning in the bucket.

Is S3 bucket name sensitive?

S3 bucket names are case sensitive. Unfortunately, URLs are not. Trying to access MyBucket.s3.amazonaws.com will actually access mybucket.s3.amazonaws.com . Due to this mismatch, there is no URL by which you can access a bucket with capital letters in the name, so we set the URL for these buckets to be empty.


1 Answers

If you won't use some unique string when building a key, you'll reach very soon the problem of a key override.

dsca007.jpg is not very unique. there is a big chance that a file with the same name will be uploaded to your s3 bucket. In this case you'll lost the old file or get an error, depends on your configuration.

e.g. this is what we are using: https://s3.amazonaws.com/bucket_name/user_media/videos/screenshots/cmXRyLRQxe9R139023426817_vid001.jpeg

where the prefix: cmXRyLRQxe9R139023426817_ is a self generated string we build and concat to the original file name: vid001.jpg before uploading to s3.

like image 145
Amit Talmor Avatar answered Oct 03 '22 06:10

Amit Talmor