Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why are S3 and Google Storage bucket names a global namespace?

This has me puzzled. I can obviously understand why account ID's are global, but why bucket names?

Wouldn't it make more sense to have something like: https://accountID.storageservice.com/bucketName

Which would namespace buckets under accountID.

What am I missing, why did these obviously elite architects choose to handle bucket names this way?

like image 226
AJB Avatar asked Jun 09 '14 01:06

AJB


People also ask

Why is s3 namespace global?

You might be surprised to know that, once you create an S3 bucket named “my-bucket”, you or anyone else can not create a bucket with same name even in any other region or accounts until you delete that bucket. This simply means that, s3 bucket name is unique globally and the namespace is shared by all AWS accounts.

Are GCS bucket names globally unique?

Yes, "single namespace" and "globally" mean what you said: All GCS buckets must have unique names, regardless of organization, project, and region.

Are AWS buckets global?

While the name space for buckets is global, S3 (like most of the other AWS services) runs in each AWS region (see the AWS Global Infrastructure page for more information).


2 Answers

“The bucket namespace is global - just like domain names”

— http://aws.amazon.com/articles/1109#02

That's more than coincidental.

The reason seems simple enough: buckets and their objects can be accessed through a custom hostname that's the same as the bucket name... and a bucket can optionally host an entire static web site -- with S3 automatically mapping requests from the incoming Host: header onto the bucket of the same name.

In S3, these variant URLs reference the same object "foo.txt" in the bucket "bucket.example.com". The first one works with static website hosting enabled and requires a DNS CNAME (or Alias in Route 53) or a DNS CNAME pointing to the regional REST endpoint; the others require no configuration:

http://bucket.example.com/foo.txt http://bucket.example.com.s3.amazonaws.com/foo.txt http://bucket.example.com.s3[-region].amazonaws.com/foo.txt http://s3[-region].amazonaws.com/bucket.example.com/foo.txt    

If an object store service needs a simple mechanism to resolve the Host: header in an HTTP incoming request into a bucket name, the bucket name namespace also needs to be global. Anything else, it seems, would complicate the implementation significantly.

For hostnames to be mappable to bucket names, something has to be globally unique, since obviously no two buckets could respond to the same hostname. The restriction being applied to the bucket name itself leaves no room for ambiguity.

It also seems likely that many potential clients wouldn't like to have their account identified in bucket names.

Of course, you could always add your account id, or any random string, to your desired bucket name, e.g. jozxyqk-payroll, jozxyqk-personnel, if the bucket name you wanted wasn't available.

like image 161
Michael - sqlbot Avatar answered Oct 01 '22 00:10

Michael - sqlbot


The more I drink the greater the concept below makes sense, so I've elevated it from a comment on the accepted answer to its own entity:

An additional thought that popped into my head randomly tonight:

Given the ability to use the generic host names that the various object store services provide, one could easily obscure your corporate (or other) identity as the owner of any given data resource.

So, let's say Black Hat Corp hosts a data resource at http://s3.amazonaws.com/obscure-bucket-name/something-to-be-dissassociated.txt‌​.

It would be very difficult for any non-governmental entity to determine who the owner of that resource is without co-operaton from the object store provider.

Not nefarious by design, just objective pragmatism.

And possibly a stroke of brilliance by the architects of this paradigm

like image 22
AJB Avatar answered Sep 30 '22 22:09

AJB