I'm trying to get the files from specific folders in s3 Buckets:
I have 4 buckets in s3 with the following names:
1 - 'PDF'
2 - 'TXT'
3 - 'PNG'
4 - 'JPG'
The folder structure for all s3 buckets looks like this:
1- PDF/analysis/pdf-to-img/processed/files
2- TXT/report/processed/files
3- PNG/analysis/reports/png-to-txt/processed/files
4- JPG/jpg-to-txt/empty
I have to check if this folder prefix processed/files
is present in the bucket, and if it is present, I'll read the files present in those directories, else I'll ignore them.
Code:
buckets = ['PDF','TXT','PNG','JPG']
client = boto3.client('s3')
for i in bucket:
result = client.list_objects(Bucket=i,Prefix = 'processed/files', Delimiter='/')
print(result)
I can enter into each directory if the folder structure is same, but how can I handle this when the folder structure varies for each bucket?
This is maybe a lengthy process.
buckets = ['PDF','TXT','PNG','JPG']
s3_client = getclient('s3')
for i in buckets:
result = s3_client.list_objects(Bucket= i, Prefix='', Delimiter ='')
contents = result.get('Contents')
for content in contents:
if 'processed/files/' in content.get('Key'):
print("Do the process")
You can get the list of directories from the s3 bucket. If it contains the required folder do the required process.
import boto3
client = boto3.client('s3')
bucket_name = "bucket_name"
prefix = ""
s3 = boto3.client("s3")
result = client.list_objects(Bucket=bucket_name, Delimiter='/')
for obj in result.get('CommonPrefixes'):
prefix = obj.get('Prefix')
file_list = ListFiles(client,bucket_name,prefix)
for file in file_list:
if "processed/files" in file:
print("Found",file)
def ListFiles(client, bucket_name, prefix):
_BUCKET_NAME = bucket_name
_PREFIX = prefix
"""List files in specific S3 URL"""
response = client.list_objects(Bucket=_BUCKET_NAME, Prefix=_PREFIX)
for content in response.get('Contents', []):
#print(content)
yield content.get('Key')
]1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With