Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to access keys from buckets with periods (.) in their names using boto3?

Context

I am trying to get an encryption status for all of my buckets for a security report. However, since encryption is on a key level basis, I want to iterate through all of the keys and get a general encryption status. For example, "yes" is all keys are encrypted, "no" if none are encrypted, and "partially" is some are encrypted.
I must use boto3 because there is a known issue with boto where the encryption status for each key always returns None. See here.

Problem

I am trying to iterate over all the keys in each of my buckets using boto3. The following code works fine until it runs into buckets with names that contain periods, such as "my.test.bucket".

from boto3.session import Session

session = Session(aws_access_key_id=<ACCESS_KEY>,
                  aws_secret_access_key=<SECRET_KEY>,
                  aws_session_token=<TOKEN>)
s3_resource = session.resource('s3')

for bucket in s3_resource.buckets.all():
    for obj in bucket.objects.all():
        key = s3_resource.Object(bucket.name, obj.key)
        # Do some stuff with the key...

When it hits a bucket with a period in the name, it throws this exception when bucket.objects.all() is called, telling me to send all requests to a specific endpoint. This endpoint can be found in the exception object that is thrown.

for obj in bucket.objects.all():
File "/usr/local/lib/python2.7/site-packages/boto3/resources/collection.py", line 82, in __iter__
for page in self.pages():
File "/usr/local/lib/python2.7/site-packages/boto3/resources/collection.py", line 165, in pages
for page in pages:
File "/usr/lib/python2.7/dist-packages/botocore/paginate.py", line 85, in __iter__
response = self._make_request(current_kwargs)
File "/usr/lib/python2.7/dist-packages/botocore/paginate.py", line 157, in _make_request
return self._method(**current_kwargs)
File "/usr/lib/python2.7/dist-packages/botocore/client.py", line 310, in _api_call
return self._make_api_call(operation_name, kwargs)
File "/usr/lib/python2.7/dist-packages/botocore/client.py", line 395, in _make_api_call
raise ClientError(parsed_response, operation_name)botocore.exceptions.ClientError: An error occurred (PermanentRedirect) when calling the ListObjects operation: The bucket you are attempting to access must be addressed using the specified endpoint. Please send all future requests to this endpoint.

Things I have tried

  • Setting the endpoint_url paramter to the bucket endpoint specified in the exception response like s3_resource = session.resource('s3', endpoint_url='my.test.bucket.s3.amazonaws.com')
  • Specifying the region the bucket is located in like s3_resource = session.resource('s3', region_name='eu-west-1')

I believe the problem is similar to this stackoverflow question in boto, which fixes the problem by setting the calling_format parameter in the s3Connection constructor. Unfortunately, I can't use boto though (see above).

Update

Here is what ended up working for me. It is not the most elegant approach, but it works =).

from boto3.session import Session

session = Session(aws_access_key_id=<ACCESS_KEY>,
                  aws_secret_access_key=<SECRET_KEY>,
                  aws_session_token=<TOKEN>)
s3_resource = session.resource('s3')

# First get all the bucket names
bucket_names = [bucket.name for bucket in s3_resource.buckets.all()]


for bucket_name in bucket_names:
    # Check each name for a "." and use a different resource if needed
    if "." in bucket_name:
        region = session.client('s3').get_bucket_location(Bucket=bucket_name)['LocationConstraint']
        resource = session.resource('s3', region_name=region)
    else:
        resource = s3_resource
    bucket = resource.Bucket(bucket_name)

    # Continue as usual using this resource
    for obj in bucket.objects.all():
        key = resource.Object(bucket.name, obj.key)
        # Do some stuff with the key...
like image 986
David Morales Avatar asked Nov 05 '15 00:11

David Morales


People also ask

What is Boto3 client (' S3 ')?

​Boto3 is the official AWS SDK for Python, used to create, configure, and manage AWS services. The following are examples of defining a resource/client in boto3 for the Weka S3 service, managing credentials, and pre-signed URLs, generating secure temporary tokens, and using those to run S3 API calls.

Does Put_object overwrite?

If an object already exists in a bucket, the new object will overwrite it because Amazon S3 stores the last write request.

What is the difference between Boto3 client and resource?

To summarize, resources are higher-level abstractions of AWS services compared to clients. Resources are the recommended pattern to use boto3 as you don't have to worry about a lot of the underlying details when interacting with AWS services. As a result, code written with Resources tends to be simpler.

What is Boto3 bucket?

An Amazon S3 bucket is a storage location to hold files. S3 files are referred to as objects. This section describes how to use the AWS SDK for Python to perform common operations on S3 buckets.


1 Answers

Just generalizing the great answer provided from Ben.

import boto3
knownBucket = 'some.topLevel.BucketPath.withPeriods'
s3 = boto3.resource('s3')

#get region
region = s3.meta.client.get_bucket_location(Bucket=knownBucket)['LocationConstraint']

#set region in resource
s3 = boto3.resource('s3',region_name=region)
like image 109
blehman Avatar answered Sep 30 '22 14:09

blehman