I have a variable which has the aws s3 url
s3://bucket_name/folder1/folder2/file1.json
I want to get the bucket_name in a variables and rest i.e /folder1/folder2/file1.json in another variable. I tried the regular expressions and could get the bucket_name like below, not sure if there is a better way.
m = re.search('(?<=s3:\/\/)[^\/]+', 's3://bucket_name/folder1/folder2/file1.json') print(m.group(0))
How do I get the rest i.e - folder1/folder2/file1.json ?
I have checked if there is a boto3 feature to extract the bucket_name and key from the url, but couldn't find it.
Get an S3 Object's URL #Navigate to the AWS S3 console and click on your bucket's name. Use the search input to find the object if necessary. Click on the checkbox next to the object's name. Click on the Copy URL button.
In Amazon S3, path-style URLs use the following format: https://s3. region-code .amazonaws.com/ bucket-name / key-name.
By default, the base URL is set to s3.amazonaws.com.
The following rules apply for naming buckets in Amazon S3: Bucket names must be between 3 (min) and 63 (max) characters long. Bucket names can consist only of lowercase letters, numbers, dots (.), and hyphens (-). Bucket names must begin and end with a letter or number.
Since it's just a normal URL, you can use urlparse
to get all the parts of the URL.
>>> from urlparse import urlparse >>> o = urlparse('s3://bucket_name/folder1/folder2/file1.json', allow_fragments=False) >>> o ParseResult(scheme='s3', netloc='bucket_name', path='/folder1/folder2/file1.json', params='', query='', fragment='') >>> o.netloc 'bucket_name' >>> o.path '/folder1/folder2/file1.json'
You may have to remove the beginning slash from the key as the next answer suggests.
o.path.lstrip('/')
With Python 3 urlparse
moved to urllib.parse
so use:
from urllib.parse import urlparse
Here's a class that takes care of all the details.
try: from urlparse import urlparse except ImportError: from urllib.parse import urlparse class S3Url(object): """ >>> s = S3Url("s3://bucket/hello/world") >>> s.bucket 'bucket' >>> s.key 'hello/world' >>> s.url 's3://bucket/hello/world' >>> s = S3Url("s3://bucket/hello/world?qwe1=3#ddd") >>> s.bucket 'bucket' >>> s.key 'hello/world?qwe1=3#ddd' >>> s.url 's3://bucket/hello/world?qwe1=3#ddd' >>> s = S3Url("s3://bucket/hello/world#foo?bar=2") >>> s.key 'hello/world#foo?bar=2' >>> s.url 's3://bucket/hello/world#foo?bar=2' """ def __init__(self, url): self._parsed = urlparse(url, allow_fragments=False) @property def bucket(self): return self._parsed.netloc @property def key(self): if self._parsed.query: return self._parsed.path.lstrip('/') + '?' + self._parsed.query else: return self._parsed.path.lstrip('/') @property def url(self): return self._parsed.geturl()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With