List directory contents of an S3 bucket using Python and Boto3?

Tags:

I am trying to list all directories within an S3 bucket using Python and Boto3.

I am using the following code:

s3 = session.resource('s3')  # I already have a boto3 Session object
bucket_names = [
    'this/bucket/',
    'that/bucket/'
]
for name in bucket_names:
    bucket = s3.Bucket(name)
    for obj in bucket.objects.all():  # this raises an exception
        # handle obj

When I run this I get the following exception stack trace:

File "botolist.py", line 67, in <module>
  for obj in bucket.objects.all():
File "/Library/Python/2.7/site-packages/boto3/resources/collection.py", line 82, in __iter__
  for page in self.pages():
File "/Library/Python/2.7/site-packages/boto3/resources/collection.py", line 165, in pages
  for page in pages:
File "/Library/Python/2.7/site-packages/botocore/paginate.py", line 83, in __iter__
  response = self._make_request(current_kwargs)
File "/Library/Python/2.7/site-packages/botocore/paginate.py", line 155, in _make_request
  return self._method(**current_kwargs)
File "/Library/Python/2.7/site-packages/botocore/client.py", line 270, in _api_call
  return self._make_api_call(operation_name, kwargs)
File "/Library/Python/2.7/site-packages/botocore/client.py", line 335, in _make_api_call
  raise ClientError(parsed_response, operation_name)

botocore.exceptions.ClientError: An error occurred (NoSuchKey) when calling the ListObjects operation: The specified key does not exist.

What is the correct way to list directories inside a bucket?

238

asked Sep 17 '15 16:09

Allen Gooch

5 Answers

All these other responses leave things to be desired. Using

client.list_objects()

Limits you to 1k results max. The rest of the answers are either wrong or too complex.

Dealing with the continuation token yourself is a terrible idea. Just use paginator, which deals with that logic for you

The solution you want is:

[e['Key'] for p in client.get_paginator("list_objects_v2")\
                         .paginate(Bucket='my_bucket')
          for e in p['Contents']]

answered Oct 23 '22 13:10

Henry Henrinson

If you have the session, create a client and get the CommonPrefixes of the clients list_objects:

client = session.client('s3', 
                        # region_name='eu-west-1'
                        )

result = client.list_objects(Bucket='MyBucket', Delimiter='/')
for obj in result.get('CommonPrefixes'):
    #handle obj.get('Prefix')

There could be a lot of folders, and you might want to start in a subfolder, though. Something like this could handle that:

def folders(client, bucket, prefix=''):
    paginator = client.get_paginator('list_objects')
    for result in paginator.paginate(Bucket=bucket, Prefix=prefix, Delimiter='/'):
        for prefix in result.get('CommonPrefixes', []):
            yield prefix.get('Prefix')

gen_folders = folders(client, 'MyBucket')
list(gen_folders)

gen_subfolders = folders(client, 'MyBucket', prefix='MySubFolder/')
list(gen_subfolders)

answered Oct 23 '22 14:10

Anne M.

Alternatively you may want to use boto3.client

Example

import boto3 
client = boto3.client('s3')
client.list_objects(Bucket='MyBucket')

list_objects also supports other arguments that might be required to iterate though the result: Bucket, Delimiter, EncodingType, Marker, MaxKeys, Prefix

answered Oct 23 '22 12:10

Vor

The best way to get the list of ALL objects with a specific prefix in a S3 bucket is using list_objects_v2 along with ContinuationToken to overcome the 1000 object pagination limit.

import boto3
s3 = boto3.client('s3')

s3_bucket = 'your-bucket'
s3_prefix = 'your/prefix'
partial_list = s3.list_objects_v2(
        Bucket=s3_bucket, 
        Prefix=s3_prefix)
obj_list = partial_list['Contents']
while partial_list['IsTruncated']:
    next_token = partial_list['NextContinuationToken']
    partial_list = s3.list_objects_v2(
        Bucket=s3_bucket, 
        Prefix=s3_prefix, 
        ContinuationToken=next_token)
    obj_list.extend(partial_list['Contents'])

answered Oct 23 '22 13:10

Behrooz

If you have fewer than 1,000 objects in your folder you can use the following code:

import boto3

s3 = boto3.client('s3')
object_listing = s3.list_objects_v2(Bucket='bucket_name',
                                    Prefix='folder/sub-folder/')

answered Oct 23 '22 14:10

Toby

Related questions
                            
                                load python code at runtime
                            
                                python string format suppress/silent keyerror/indexerror [duplicate]
                            
                                Improving Performance of Django ForeignKey Fields in Admin
                            
                                Django admin display multiple fields on the same line
                            
                                Dynamic choices field in Django Models
                            
                                How can I include a python package with Hadoop streaming job?
                            
                                Unicode encoding for filesystem in Mac OS X not correct in Python?
                            
                                how to create a dictionary using two lists in python? [duplicate]
                            
                                Index Error: list index out of range (Python) [duplicate]
                            
                                Python statsmodels OLS: how to save learned model to file
                            
                                python 32-bit and 64-bit integer math with intentional overflow
                            
                                Python - Pymongo Insert and Update Documents
                            
                                Most pythonic way to convert a string to a octal number
                            
                                No module named flask.ext.wtf
                            
                                Scikit classification report - change the format of displayed results
                            
                                Validating URLs in Python
                            
                                Django : How to override the CSRF_FAILURE_TEMPLATE
                            
                                How do I send a DELETE keystroke to a text field using Selenium with Python?
                            
                                How to express classes on the axis of a heatmap in Seaborn
                            
                                How to retrieve pip requirements (freeze) within Python?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

List directory contents of an S3 bucket using Python and Boto3?

Tags:

python

amazon-web-services

amazon-s3

boto3