Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

search in each of the s3 bucket and see if the given folder exists

I'm trying to get the files from specific folders in s3 Buckets:

I have 4 buckets in s3 with the following names:

1 - 'PDF'
2 - 'TXT'
3 - 'PNG'
4 - 'JPG'

The folder structure for all s3 buckets looks like this:

1- PDF/analysis/pdf-to-img/processed/files
2- TXT/report/processed/files
3- PNG/analysis/reports/png-to-txt/processed/files
4- JPG/jpg-to-txt/empty

I have to check if this folder prefix processed/files is present in the bucket, and if it is present, I'll read the files present in those directories, else I'll ignore them.


Code:

buckets = ['PDF','TXT','PNG','JPG']

client = boto3.client('s3')
for i in bucket:
    result = client.list_objects(Bucket=i,Prefix = 'processed/files', Delimiter='/')
    print(result)

I can enter into each directory if the folder structure is same, but how can I handle this when the folder structure varies for each bucket?

like image 717
pylearner Avatar asked Jun 02 '20 18:06

pylearner


2 Answers

This is maybe a lengthy process.

 buckets = ['PDF','TXT','PNG','JPG']
    s3_client = getclient('s3')
    for i in buckets:
        result = s3_client.list_objects(Bucket= i, Prefix='', Delimiter ='')
        contents = result.get('Contents')
        for content in contents:
            if 'processed/files/' in content.get('Key'):
                print("Do the process")

You can get the list of directories from the s3 bucket. If it contains the required folder do the required process.

like image 120
Ani Guner Avatar answered Oct 13 '22 22:10

Ani Guner


import boto3

client = boto3.client('s3')
bucket_name = "bucket_name"
prefix = ""

s3 = boto3.client("s3")

result = client.list_objects(Bucket=bucket_name, Delimiter='/')
   for obj in result.get('CommonPrefixes'):  
       prefix = obj.get('Prefix')
       file_list = ListFiles(client,bucket_name,prefix)
       for file in file_list:
          if "processed/files" in file:
              print("Found",file)

def ListFiles(client, bucket_name, prefix):
    _BUCKET_NAME = bucket_name
    _PREFIX = prefix
    """List files in specific S3 URL"""
    response = client.list_objects(Bucket=_BUCKET_NAME, Prefix=_PREFIX)

    for content in response.get('Contents', []):
        #print(content)
        yield content.get('Key')

enter image description here]1

like image 40
aviboy2006 Avatar answered Oct 13 '22 21:10

aviboy2006