Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Boto3: grabbing only selected objects from the S3 resource

I can grab and read all the objects in my AWS S3 bucket via

s3 = boto3.resource('s3')
    bucket = s3.Bucket('my-bucket')
    all_objs = bucket.objects.all()
    for obj in all_objs:
        pass
        #filter only the objects I need

and then

obj.key

would give me the path within the bucket.

Is there a way to filter beforehand for only those files respecting a certain starting path (a directory in the bucket) so that I'd avoid looping over all the objects and filtering later?

like image 439
mar tin Avatar asked Mar 24 '16 20:03

mar tin


3 Answers

Use the filter[1], [2] method of collections like bucket.

s3 = boto3.resource('s3')
bucket = s3.Bucket('my-bucket')
objs = bucket.objects.filter(Prefix='myprefix')
for obj in objs:
    pass
like image 135
Ilja Everilä Avatar answered Oct 20 '22 06:10

Ilja Everilä


For folks using boto3.client('s3') rather than boto3.resource('s3'), you can use the 'Prefix' key to filter out objects in the s3 bucket

import boto3

s3 = boto3.client('s3')

params = {
    "Bucket": "HelloWorldBucket",
    "Prefix": "Happy"
}

happy_objects = s3.list_objects_v2(**params)

The above code snippet will fetch all files in the 'Happy' folder in the 'HelloWorldBucket'.

PS: folder in s3 is just a construct and is implemented as a prefix to the file/object name.

like image 37
Gru Avatar answered Oct 20 '22 06:10

Gru


If we just need list of object-keys then, bucket.objects.filter is a better alternative to list_objects or list_object_v2, as those functions have limit of 1000 objects. Reference: list_objects_v2

like image 38
Lavesh Avatar answered Oct 20 '22 06:10

Lavesh