Boto3 read a file content from S3 key line by line

Tags:

With boto3, you can read a file content from a location in S3, given a bucket name and the key, as per (this assumes a preliminary import boto3)

s3 = boto3.resource('s3')

content = s3.Object(BUCKET_NAME, S3_KEY).get()['Body'].read()

This returns a string type. The specific file I need to fetch happens to be a collection of dictionary-like objects, one per line. So it is not a JSON format. Instead of reading it as a string, I'd like to stream it as a file object and read it line by line; cannot find a way to do this other than downloading the file locally first as

s3 = boto3.resource('s3')

bucket = s3.Bucket(BUCKET_NAME)

filename = 'my-file'
bucket.download_file(S3_KEY, filename)

f = open('my-file')

What I'm asking is if it's possible to have this type of control on the file without having to download it locally first?

641

asked Nov 29 '17 17:11

mar tin

1 Answers

I found .splitlines() worked for me...

txt_file = s3.Object(bucket, file).get()['Body'].read().decode('utf-8').splitlines()

Without the .splitlines() the whole blob of text was return and trying to iterate each line resulted in each char being iterated. With .splitlines() iteration by line was achievable.

In my example here I iterate through each line and compile it into a dict.

txt_file = s3.Object(bucket, file).get()['Body'].read().decode(
        'utf-8').splitlines()

for line in txt_file:
    arr = line.split()
    print(arr)

answered Sep 30 '22 00:09

amcleod83

Related questions
                            
                                Pystray systray icon
                            
                                TypeError: unsupported callable using Dataset with estimator input_fn
                            
                                Jupyter Notebook Counter when processing
                            
                                One view for multiple sub-domains using django-hosts
                            
                                Converting dictionary with values in List to Pandas DataFrame
                            
                                Send Excel file via Gmail API but attachment is corrupted
                            
                                The result of fft in tensorflow is different from numpy
                            
                                AssertionError: all exprs should be Column
                            
                                Statsmodels P>|t|
                            
                                Custom pandas groupby on a list of intervals
                            
                                Use text but not marker in matplotlib legend
                            
                                filtering a 3D numpy array according to 2D numpy array
                            
                                Python regex to match multiple times, store results separately
                            
                                reshaping image feed to tensorflow
                            
                                confused about the `copy` attribution of `numpy.astype`
                            
                                is it bad practice to call dictConfig more than once?
                            
                                TypeError: 'DataFrameReader' object is not callable
                            
                                Open new gnome-terminal and run command
                            
                                Use "Flatten" or "Reshape" to get 1D output of unknown input shape in keras
                            
                                Very Large and Very Sparse Non Negative Matrix factorization

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Boto3 read a file content from S3 key line by line

Tags:

python

amazon-web-services

amazon-s3

boto3

mar tin

People also ask

1 Answers

amcleod83

Recent Activity

Donate For Us