Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Read a file line by line from S3 using boto?

I have a csv file in S3 and I'm trying to read the header line to get the size (these files are created by our users so they could be almost any size). Is there a way to do this using boto? I thought maybe I could us a python BufferedReader, but I can't figure out how to open a stream from an S3 key. Any suggestions would be great. Thanks!

like image 963
gignosko Avatar asked Feb 19 '15 22:02

gignosko


People also ask

Can I read S3 file without downloading?

Reading objects without downloading them Similarly, if you want to upload and read small pieces of textual data such as quotes, tweets, or news articles, you can do that using the S3 resource method put(), as demonstrated in the example below (Gist).

How do I view files in S3 bucket?

In the Amazon S3 console, choose your S3 bucket, choose the file that you want to open or download, choose Actions, and then choose Open or Download. If you are downloading an object, specify where you want to save it.


3 Answers

Here's a solution which actually streams the data line by line:

from io import TextIOWrapper
from gzip import GzipFile
...

# get StreamingBody from botocore.response
response = s3.get_object(Bucket=bucket, Key=key)
# if gzipped
gzipped = GzipFile(None, 'rb', fileobj=response['Body'])
data = TextIOWrapper(gzipped)

for line in data:
    # process line
like image 111
kooshywoosh Avatar answered Oct 25 '22 03:10

kooshywoosh


You may find https://pypi.python.org/pypi/smart_open useful for your task.

From documentation:

for line in smart_open.smart_open('s3://mybucket/mykey.txt'):
    print line
like image 20
Michael Korbakov Avatar answered Oct 25 '22 02:10

Michael Korbakov


I know it's a very old question.

But as for now, we can just use s3_conn.get_object(Bucket=bucket, Key=key)['Body'].iter_lines()

like image 22
peon Avatar answered Oct 25 '22 03:10

peon