Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Boto3 read a file content from S3 key line by line

With boto3, you can read a file content from a location in S3, given a bucket name and the key, as per (this assumes a preliminary import boto3)

s3 = boto3.resource('s3')

content = s3.Object(BUCKET_NAME, S3_KEY).get()['Body'].read()

This returns a string type. The specific file I need to fetch happens to be a collection of dictionary-like objects, one per line. So it is not a JSON format. Instead of reading it as a string, I'd like to stream it as a file object and read it line by line; cannot find a way to do this other than downloading the file locally first as

s3 = boto3.resource('s3')

bucket = s3.Bucket(BUCKET_NAME)

filename = 'my-file'
bucket.download_file(S3_KEY, filename)

f = open('my-file')

What I'm asking is if it's possible to have this type of control on the file without having to download it locally first?

like image 641
mar tin Avatar asked Nov 29 '17 17:11

mar tin


People also ask

How do I read an S3 bucket file?

In the Amazon S3 console, choose your S3 bucket, choose the file that you want to open or download, choose Actions, and then choose Open or Download. If you are downloading an object, specify where you want to save it. The procedure for saving the object depends on the browser and operating system that you are using.


1 Answers

I found .splitlines() worked for me...

txt_file = s3.Object(bucket, file).get()['Body'].read().decode('utf-8').splitlines()

Without the .splitlines() the whole blob of text was return and trying to iterate each line resulted in each char being iterated. With .splitlines() iteration by line was achievable.

In my example here I iterate through each line and compile it into a dict.

txt_file = s3.Object(bucket, file).get()['Body'].read().decode(
        'utf-8').splitlines()

for line in txt_file:
    arr = line.split()
    print(arr)
like image 54
amcleod83 Avatar answered Sep 30 '22 00:09

amcleod83