I have connected to Amazon S3 and am trying to retrieve data from the JSON content from multiple buckets using the below code.
But I have to read only specific JSON files, but not all. How do I do it?
Code:
for i in bucket:
try:
result = client.list_objects(Bucket=i,Prefix = 'PROCESSED_BY/FILE_JSON', Delimiter='/')
content_object = s3.Object(i, "PROCESSED_BY/FILE_JSON/?Account.json")
file_content = content_object.get()['Body'].read().decode('utf-8')
json_content = json.loads(file_content)
except KeyError:
pass
Bucket structure example.
test-eob/PROCESSED_BY/FILE_JSON/222-Account.json
test-eob/PROCESSED_BY/FILE_JSON/1212121-Account.json
test-eob/PROCESSED_BY/FILE_JSON/122-multi.json
test-eob/PROCESSED_BY/FILE_JSON/qwqwq-Account.json
test-eob/PROCESSED_BY/FILE_JSON/wqwqw-multi.json
From the above list, I want to only read *-Account.json files.
How can I achieve this?
In the Amazon S3 console, choose your S3 bucket, choose the file that you want to open or download, choose Actions, and then choose Open or Download. If you are downloading an object, specify where you want to save it. The procedure for saving the object depends on the browser and operating system that you are using.
Those are two additional things you may not have already known about, or wanted to learn or think about to “simply” read/write a file to Amazon S3. I do recommend learning them, though; they come up fairly often, especially the with statement.
To allow Read and Write access to an object in an Amazon S3 bucket and also include additional permissions for console access, see Amazon S3: Allows read and write access to objects in an S3 Bucket, programmatically and in the console . © 2021, Amazon Web Services, Inc. or its affiliates.
Amazon S3 Select works on objects stored in CSV, JSON, or Apache Parquet format. It also works with objects that are compressed with GZIP or BZIP2 (for CSV and JSON objects only), and server-side encrypted objects. You can specify the format of the results as either CSV or JSON, and you can determine how the records in the result are delimited.
From the Amazon S3 console, choose the bucket with the object that you want to update. Navigate to the folder that contains the object. From the object list, choose the name of the object. Choose the Permissions tab. Under Public access, choose Everyone. In the Everyone dialog box, for Access to the object, select Read object.
There are several ways to do this in Python. For example, checking if 'stringA' is in 'stringB':
list1=['test-eob/PROCESSED_BY/FILE_JSON/222-Account.json',
'test-eob/PROCESSED_BY/FILE_JSON/1212121-Account.json',
'test-eob/PROCESSED_BY/FILE_JSON/122-multi.json',
'test-eob/PROCESSED_BY/FILE_JSON/qwqwq-Account.json',
'test-eob/PROCESSED_BY/FILE_JSON/wqwqw-multi.json',]
for i in list1:
if 'Account' in i:
print (i)
else:
pass
You can make use of a regex that matches your pattern from the list of objects.
import re
MATCH = "FILE_JSON/.*?Account.json"
full_list = [
"test-eob/PROCESSED_BY/FILE_JSON/222-Account.json",
"test-eob/PROCESSED_BY/FILE_JSON/1212121-Account.json",
"test-eob/PROCESSED_BY/FILE_JSON/122-multi.json",
"test-eob/PROCESSED_BY/FILE_JSON/qwqwq-Account.json",
"test-eob/PROCESSED_BY/FILE_JSON/wqwqw-multi.json"
]
for item in full_list:
if re.search(MATCH, item):
print(item)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With