Read a file line by line from S3 using boto?

Tags:

I have a csv file in S3 and I'm trying to read the header line to get the size (these files are created by our users so they could be almost any size). Is there a way to do this using boto? I thought maybe I could us a python BufferedReader, but I can't figure out how to open a stream from an S3 key. Any suggestions would be great. Thanks!

963

asked Feb 19 '15 22:02

gignosko

3 Answers

Here's a solution which actually streams the data line by line:

from io import TextIOWrapper
from gzip import GzipFile
...

# get StreamingBody from botocore.response
response = s3.get_object(Bucket=bucket, Key=key)
# if gzipped
gzipped = GzipFile(None, 'rb', fileobj=response['Body'])
data = TextIOWrapper(gzipped)

for line in data:
    # process line

111

answered Oct 25 '22 03:10

kooshywoosh

You may find https://pypi.python.org/pypi/smart_open useful for your task.

From documentation:

for line in smart_open.smart_open('s3://mybucket/mykey.txt'):
    print line

answered Oct 25 '22 02:10

Michael Korbakov

I know it's a very old question.

But as for now, we can just use s3_conn.get_object(Bucket=bucket, Key=key)['Body'].iter_lines()

answered Oct 25 '22 03:10

peon

Related questions
                            
                                python - Week number of the month
                            
                                Read .doc file with python
                            
                                With Flask, how can I serve robots.txt and sitemap.xml as static files? [duplicate]
                            
                                Python Requests: Don't wait for request to finish
                            
                                Command line input in Python
                            
                                python: changing row index of pandas data frame [duplicate]
                            
                                Remove rows with empty lists from pandas data frame
                            
                                How can you get the call tree with Python profilers?
                            
                                Get local timezone in django
                            
                                Writing json to file in s3 bucket
                            
                                How to check if file exists in Google Cloud Storage?
                            
                                Check unread count of Gmail messages with Python
                            
                                Detect tap with pyaudio from live mic
                            
                                Clamping floating numbers in Python? [duplicate]
                            
                                Python: unsigned 32 bit bitwise arithmetic
                            
                                Display the date, like "May 5th", using pythons strftime? [duplicate]
                            
                                Python to JSON Serialization fails on Decimal [duplicate]
                            
                                Python requests module sends JSON string instead of x-www-form-urlencoded param string
                            
                                How to safely get the file extension from a URL?
                            
                                How do I modify the session in the Django test framework

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Read a file line by line from S3 using boto?

Tags:

python

amazon-web-services

amazon-s3

boto