Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

read only particular json files from s3 buckets from multiple folders

am trying to scroll over all buckets in s3 and see if there is a prefix that matches and get into those folders and read the json files.

I have tried to get the folders that contain a prefix, but failing to enter them.

Code:

import boto3
bucket = ['test-eob', 'test-eob-images']
client = boto3.client('s3')
for i in bucket:
    result = client.list_objects(Bucket=i,Prefix = 'PROCESSED_BY/FILE_JSON', Delimiter='/')
    print(result)

Using this am getting the ones with prefix and fails when bucket doesnt have that prefix.

structure of test-eob , test-eob/PROCESSED_BY/FILE_JSON/*.json I have to read the json if only my prefix matches, else come out of the bucket.

Can anyone help me out here.

like image 823
pylearner Avatar asked Jun 02 '20 07:06

pylearner


People also ask

Which of the below allows you to restrict access to individual objects in an S3 bucket?

Amazon S3 is the only object storage service that allows you to block public access to all of your objects at the bucket or the account level, now and in the future by using S3 Block Public Access. To ensure that public access to all your S3 buckets and objects is blocked, turn on block all public access.

Is it better to have multiple S3 buckets or one bucket with sub folders?

The total volume of data and number of objects you can store are unlimited. Also the documentation states there is no performance difference between using a single bucket or multiple buckets so I guess both option 1 and 2 would be suitable for you.


1 Answers

Try to catch the error(is it a KeyError?) when the bucket does not contain the prefix.

For example:

for i in bucket:
    try:
          result = client.list_objects(Bucket=i,Prefix = 'PROCESSED_BY/FILE_JSON', Delimiter='/')
          print(result)
    except KeyError:
          pass

To read the json, there are several ways. For example with json.loads() from the json module.

So for each object in the bucket:

content_object = s3.Object(bucket_name, file_name)
file_content = content_object.get()['Body'].read().decode('utf-8')
json_content = json.loads(file_content)
like image 151
Adi Dembak Avatar answered Nov 14 '22 19:11

Adi Dembak