Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

s3 urls - get bucket name and path

Tags:

python

boto3

I have a variable which has the aws s3 url

s3://bucket_name/folder1/folder2/file1.json 

I want to get the bucket_name in a variables and rest i.e /folder1/folder2/file1.json in another variable. I tried the regular expressions and could get the bucket_name like below, not sure if there is a better way.

m = re.search('(?<=s3:\/\/)[^\/]+', 's3://bucket_name/folder1/folder2/file1.json') print(m.group(0)) 

How do I get the rest i.e - folder1/folder2/file1.json ?

I have checked if there is a boto3 feature to extract the bucket_name and key from the url, but couldn't find it.

like image 241
Lijju Mathew Avatar asked Mar 07 '17 06:03

Lijju Mathew


People also ask

How do I find my S3 bucket path?

Get an S3 Object's URL #Navigate to the AWS S3 console and click on your bucket's name. Use the search input to find the object if necessary. Click on the checkbox next to the object's name. Click on the Copy URL button.

What is an S3 path?

In Amazon S3, path-style URLs use the following format: https://s3. region-code .amazonaws.com/ bucket-name / key-name.

What is base URL of S3 bucket?

By default, the base URL is set to s3.amazonaws.com.

What is S3 bucket name?

The following rules apply for naming buckets in Amazon S3: Bucket names must be between 3 (min) and 63 (max) characters long. Bucket names can consist only of lowercase letters, numbers, dots (.), and hyphens (-). Bucket names must begin and end with a letter or number.


1 Answers

Since it's just a normal URL, you can use urlparse to get all the parts of the URL.

>>> from urlparse import urlparse >>> o = urlparse('s3://bucket_name/folder1/folder2/file1.json', allow_fragments=False) >>> o ParseResult(scheme='s3', netloc='bucket_name', path='/folder1/folder2/file1.json', params='', query='', fragment='') >>> o.netloc 'bucket_name' >>> o.path '/folder1/folder2/file1.json' 

You may have to remove the beginning slash from the key as the next answer suggests.

o.path.lstrip('/') 

With Python 3 urlparse moved to urllib.parse so use:

from urllib.parse import urlparse 

Here's a class that takes care of all the details.

try:     from urlparse import urlparse except ImportError:     from urllib.parse import urlparse   class S3Url(object):     """     >>> s = S3Url("s3://bucket/hello/world")     >>> s.bucket     'bucket'     >>> s.key     'hello/world'     >>> s.url     's3://bucket/hello/world'      >>> s = S3Url("s3://bucket/hello/world?qwe1=3#ddd")     >>> s.bucket     'bucket'     >>> s.key     'hello/world?qwe1=3#ddd'     >>> s.url     's3://bucket/hello/world?qwe1=3#ddd'      >>> s = S3Url("s3://bucket/hello/world#foo?bar=2")     >>> s.key     'hello/world#foo?bar=2'     >>> s.url     's3://bucket/hello/world#foo?bar=2'     """      def __init__(self, url):         self._parsed = urlparse(url, allow_fragments=False)      @property     def bucket(self):         return self._parsed.netloc      @property     def key(self):         if self._parsed.query:             return self._parsed.path.lstrip('/') + '?' + self._parsed.query         else:             return self._parsed.path.lstrip('/')      @property     def url(self):         return self._parsed.geturl() 
like image 189
kichik Avatar answered Sep 22 '22 14:09

kichik