With boto3, you can read a file content from a location in S3, given a bucket name and the key, as per (this assumes a preliminary import boto3
)
s3 = boto3.resource('s3')
content = s3.Object(BUCKET_NAME, S3_KEY).get()['Body'].read()
This returns a string type. The specific file I need to fetch happens to be a collection of dictionary-like objects, one per line. So it is not a JSON format. Instead of reading it as a string, I'd like to stream it as a file object and read it line by line; cannot find a way to do this other than downloading the file locally first as
s3 = boto3.resource('s3')
bucket = s3.Bucket(BUCKET_NAME)
filename = 'my-file'
bucket.download_file(S3_KEY, filename)
f = open('my-file')
What I'm asking is if it's possible to have this type of control on the file without having to download it locally first?
In the Amazon S3 console, choose your S3 bucket, choose the file that you want to open or download, choose Actions, and then choose Open or Download. If you are downloading an object, specify where you want to save it. The procedure for saving the object depends on the browser and operating system that you are using.
I found .splitlines() worked for me...
txt_file = s3.Object(bucket, file).get()['Body'].read().decode('utf-8').splitlines()
Without the .splitlines() the whole blob of text was return and trying to iterate each line resulted in each char being iterated. With .splitlines() iteration by line was achievable.
In my example here I iterate through each line and compile it into a dict.
txt_file = s3.Object(bucket, file).get()['Body'].read().decode(
'utf-8').splitlines()
for line in txt_file:
arr = line.split()
print(arr)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With