Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I translate an AWS S3 url into a bucket name for boto?

I'm trying to access the http://s3.amazonaws.com/commoncrawl/parse-output/segment/ bucket with boto. I can't figure out how to translate this into a name for boto.s3.bucket.Bucket().

This is the gist of what I'm going for:

s3 = boto.connect_s3()
cc = boto.s3.bucket.Bucket(connection=s3, name='commoncrawl/parse-output/segment')
requester = {'x-amz-request-payer':'requester'}
contents = cc.list(headers=requester)
for i,item in enumerate(contents):
    print item.__repr__()

I get "boto.exception.S3ResponseError: S3ResponseError: 400 Bad Request ... The specified bucket is not valid..."

like image 515
Kevin Avatar asked Dec 01 '22 22:12

Kevin


1 Answers

The AWS documents list four possible url formats for S3 -- here's something I just threw together to extract the bucket and region for all of the different url formats.

import re

def bucket_name_from_url(url):
    """ Gets bucket name and region from url, matching any of the different formats for S3 urls 
    * http://bucket.s3.amazonaws.com
    * http://bucket.s3-aws-region.amazonaws.com
    * http://s3.amazonaws.com/bucket
    * http://s3-aws-region.amazonaws.com/bucket

    returns bucket name, region
    """       
    match =  re.search('^https?://(.+).s3.amazonaws.com/', url)
    if match:
        return match.group(1), None

    match =  re.search('^https?://(.+).s3-([^.]+).amazonaws.com/', url)
    if match:
        return match.group(1), match.group(2)

    match = re.search('^https?://s3.amazonaws.com/([^\/]+)', url)
    if match:
        return match.group(1), None

    match =  re.search('^https?://s3-([^.]+).amazonaws.com/([^\/]+)', url)
    if match:
        return match.group(2), match.group(1)

    return None, None

Something like this should really go into boto ... Amazon, I hope you're listening

EDIT 10/10/2018: The bucket regexes should now capture bucket names with periods.

like image 58
Mark Chackerian Avatar answered Dec 25 '22 09:12

Mark Chackerian