s3 urls - get bucket name and path

Tags:

boto3

I have a variable which has the aws s3 url

s3://bucket_name/folder1/folder2/file1.json

I want to get the bucket_name in a variables and rest i.e /folder1/folder2/file1.json in another variable. I tried the regular expressions and could get the bucket_name like below, not sure if there is a better way.

m = re.search('(?<=s3:\/\/)[^\/]+', 's3://bucket_name/folder1/folder2/file1.json') print(m.group(0))

How do I get the rest i.e - folder1/folder2/file1.json ?

I have checked if there is a boto3 feature to extract the bucket_name and key from the url, but couldn't find it.

241

asked Mar 07 '17 06:03

Lijju Mathew

1 Answers

Since it's just a normal URL, you can use urlparse to get all the parts of the URL.

>>> from urlparse import urlparse >>> o = urlparse('s3://bucket_name/folder1/folder2/file1.json', allow_fragments=False) >>> o ParseResult(scheme='s3', netloc='bucket_name', path='/folder1/folder2/file1.json', params='', query='', fragment='') >>> o.netloc 'bucket_name' >>> o.path '/folder1/folder2/file1.json'

You may have to remove the beginning slash from the key as the next answer suggests.

o.path.lstrip('/')

With Python 3 urlparse moved to urllib.parse so use:

from urllib.parse import urlparse

Here's a class that takes care of all the details.

try:     from urlparse import urlparse except ImportError:     from urllib.parse import urlparse   class S3Url(object):     """     >>> s = S3Url("s3://bucket/hello/world")     >>> s.bucket     'bucket'     >>> s.key     'hello/world'     >>> s.url     's3://bucket/hello/world'      >>> s = S3Url("s3://bucket/hello/world?qwe1=3#ddd")     >>> s.bucket     'bucket'     >>> s.key     'hello/world?qwe1=3#ddd'     >>> s.url     's3://bucket/hello/world?qwe1=3#ddd'      >>> s = S3Url("s3://bucket/hello/world#foo?bar=2")     >>> s.key     'hello/world#foo?bar=2'     >>> s.url     's3://bucket/hello/world#foo?bar=2'     """      def __init__(self, url):         self._parsed = urlparse(url, allow_fragments=False)      @property     def bucket(self):         return self._parsed.netloc      @property     def key(self):         if self._parsed.query:             return self._parsed.path.lstrip('/') + '?' + self._parsed.query         else:             return self._parsed.path.lstrip('/')      @property     def url(self):         return self._parsed.geturl()

189

answered Sep 22 '22 14:09

kichik

Related questions
                            
                                Corpora/stopwords not found when import nltk library
                            
                                Python Argument Binders
                            
                                Cannot open include file: 'io.h': No such file or directory
                            
                                Callable modules
                            
                                Factorial in numpy and scipy
                            
                                can you add HTTPS functionality to a python flask web server?
                            
                                pip install pygraphviz: No package 'libcgraph' found
                            
                                How to annotate Count with a condition in a Django queryset
                            
                                Find all CSV files in a directory using Python
                            
                                urllib2 and json
                            
                                How to load existing db file to memory in Python sqlite3?
                            
                                Python Requests - How to use system ca-certificates (debian/ubuntu)?
                            
                                Convert Select Columns in Pandas Dataframe to Numpy Array
                            
                                Coalesce values from 2 columns into a single column in a pandas dataframe
                            
                                Find out if matrix is positive definite with numpy
                            
                                Flask Python Buttons
                            
                                LBYL vs EAFP in Java?
                            
                                How to create inline objects with properties?
                            
                                OpenCV Python rotate image by X degrees around specific point
                            
                                GeoDjango GEOSException error

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With