Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Search specific file in AWS S3 bucket using python

I have AWS S3 access and the bucket has nearly 300 files inside the bucket. I need to download single file from this bucket by pattern matching or search because i do not know the exact filename (Say files ends with .csv format).
Here is my sample code which shows all files inside the bucket

def s3connection(credentialsdict):
    """
    :param access_key: Access key for AWS to establish S3 connection
    :param secret_key: Secret key for AWS to establish S3 connection
    :param file_name: file name of the billing file(csv file)
    :param bucket_name: Name of the bucket which consists of billing files
    :return: status, billing_bucket, billing_key
    """
    os.environ['S3_USE_SIGV4'] = 'True'
    conn = S3Connection(credentialsdict["access_key"], credentialsdict["secret_key"], host='s3.amazonaws.com')
    billing_bucket = conn.get_bucket(credentialsdict["bucket_name"], validate=False)
    try:
        billing_bucket.get_location()
    except S3ResponseError as e:
        if e.status == 400 and e.error_code == 'AuthorizationHeaderMalformed':
            conn.auth_region_name = ET.fromstring(e.body).find('./Region').text
    billing_bucket = conn.get_bucket(credentialsdict["bucket_name"])
    print billing_bucket

    if not billing_bucket:
        raise Exception("Please Enter valid bucket name. Bucket %s does not exist"
                        % credentialsdict.get("bucket_name"))
    for key in billing_bucket.list():
        print key.name
    del os.environ['S3_USE_SIGV4']

Can I pass search string to retrieve the exact matched filenames?

like image 980
sangeeth kumar Avatar asked May 19 '26 04:05

sangeeth kumar


1 Answers

You can use JMESPath expressions to search and filter down S3 files. To do that you need to get s3 paginator over list_objects_v2.

import boto3
client = boto3.client('s3')
paginator = client.get_paginator('list_objects_v2')
page_iterator = paginator.paginate(Bucket="your_bucket_name")

Now that you have iterator you can use JMESPath search. Most useful is contains - to do %like% query

objects = page_iterator.search("Contents[?contains(Key, `partial-file-name`)][]")

But in your case (to find all files ending .csv it's better to use ends_with - to do *.csv query

objects = page_iterator.search("Contents[?ends_with(Key, `.csv`)][]")

Then you can get object keys with

for item in objects:
    print(item['Key'])

This answer is based on https://blog.jeffbryner.com/2020/04/21/jupyter-pandas-analysis.html and https://stackoverflow.com/a/27274997/4587704

like image 128
mojeto Avatar answered May 21 '26 17:05

mojeto



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!