Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Read files with only specific names from Amazon S3

I have connected to Amazon S3 and am trying to retrieve data from the JSON content from multiple buckets using the below code.

But I have to read only specific JSON files, but not all. How do I do it?

Code:

for i in bucket:
    try:
          result = client.list_objects(Bucket=i,Prefix = 'PROCESSED_BY/FILE_JSON', Delimiter='/')
          content_object = s3.Object(i, "PROCESSED_BY/FILE_JSON/?Account.json")
          file_content = content_object.get()['Body'].read().decode('utf-8')
          json_content = json.loads(file_content)
    except KeyError:
          pass

Bucket structure example.

test-eob/PROCESSED_BY/FILE_JSON/222-Account.json
test-eob/PROCESSED_BY/FILE_JSON/1212121-Account.json
test-eob/PROCESSED_BY/FILE_JSON/122-multi.json
test-eob/PROCESSED_BY/FILE_JSON/qwqwq-Account.json
test-eob/PROCESSED_BY/FILE_JSON/wqwqw-multi.json

From the above list, I want to only read *-Account.json files.

How can I achieve this?

like image 533
pylearner Avatar asked Jun 02 '20 09:06

pylearner


People also ask

How do I pull data from AWS S3?

In the Amazon S3 console, choose your S3 bucket, choose the file that you want to open or download, choose Actions, and then choose Open or Download. If you are downloading an object, specify where you want to save it. The procedure for saving the object depends on the browser and operating system that you are using.

Is it possible to simply read/write a file to Amazon S3?

Those are two additional things you may not have already known about, or wanted to learn or think about to “simply” read/write a file to Amazon S3. I do recommend learning them, though; they come up fairly often, especially the with statement.

How do I allow read and write access to S3 bucket?

To allow Read and Write access to an object in an Amazon S3 bucket and also include additional permissions for console access, see Amazon S3: Allows read and write access to objects in an S3 Bucket, programmatically and in the console . © 2021, Amazon Web Services, Inc. or its affiliates.

What file formats does Amazon S3 Select support?

Amazon S3 Select works on objects stored in CSV, JSON, or Apache Parquet format. It also works with objects that are compressed with GZIP or BZIP2 (for CSV and JSON objects only), and server-side encrypted objects. You can specify the format of the results as either CSV or JSON, and you can determine how the records in the result are delimited.

How do I update an object in Amazon S3?

From the Amazon S3 console, choose the bucket with the object that you want to update. Navigate to the folder that contains the object. From the object list, choose the name of the object. Choose the Permissions tab. Under Public access, choose Everyone. In the Everyone dialog box, for Access to the object, select Read object.


2 Answers

There are several ways to do this in Python. For example, checking if 'stringA' is in 'stringB':

list1=['test-eob/PROCESSED_BY/FILE_JSON/222-Account.json',
'test-eob/PROCESSED_BY/FILE_JSON/1212121-Account.json',
'test-eob/PROCESSED_BY/FILE_JSON/122-multi.json',
'test-eob/PROCESSED_BY/FILE_JSON/qwqwq-Account.json',
'test-eob/PROCESSED_BY/FILE_JSON/wqwqw-multi.json',]

for i in list1:
    if 'Account' in i:
        print (i)
    else:
        pass
like image 90
Adi Dembak Avatar answered Oct 20 '22 17:10

Adi Dembak


You can make use of a regex that matches your pattern from the list of objects.

import re

MATCH = "FILE_JSON/.*?Account.json"

full_list = [
  "test-eob/PROCESSED_BY/FILE_JSON/222-Account.json",
  "test-eob/PROCESSED_BY/FILE_JSON/1212121-Account.json",
  "test-eob/PROCESSED_BY/FILE_JSON/122-multi.json",
  "test-eob/PROCESSED_BY/FILE_JSON/qwqwq-Account.json",
  "test-eob/PROCESSED_BY/FILE_JSON/wqwqw-multi.json"
]

for item in full_list:
  if re.search(MATCH, item):
    print(item)
like image 45
Ramkumar R Avatar answered Oct 20 '22 18:10

Ramkumar R