I have code that fetches an AWS S3 object. How do I read this StreamingBody with Python's csv.DictReader?
import boto3, csv
session = boto3.session.Session(aws_access_key_id=<>, aws_secret_access_key=<>, region_name=<>)
s3_resource = session.resource('s3')
s3_object = s3_resource.Object(<bucket>, <key>)
streaming_body = s3_object.get()['Body']
#csv.DictReader(???)
csv. Reader() allows you to access CSV data using indexes and is ideal for simple CSV files. csv. DictReader() on the other hand is friendlier and easy to use, especially when working with large CSV files.
The csv. DictReader() returned an OrderedDict type for each row. That's why we used dict() to convert each row to a dictionary. Notice that we have explicitly used the dict() method to create dictionaries inside the for loop.
CSV, or "comma-separated values", is a common file format for data. The csv module helps you to elegantly process data stored within a CSV file. Also see the csv documentation. This guide uses the following example file, people.
The code would be something like this:
import boto3
import csv
# get a handle on s3
s3 = boto3.resource(u's3')
# get a handle on the bucket that holds your file
bucket = s3.Bucket(u'bucket-name')
# get a handle on the object you want (i.e. your file)
obj = bucket.Object(key=u'test.csv')
# get the object
response = obj.get()
# read the contents of the file and split it into a list of lines
# for python 2:
lines = response[u'Body'].read().split()
# for python 3 you need to decode the incoming bytes:
lines = response['Body'].read().decode('utf-8').split()
# now iterate over those lines
for row in csv.DictReader(lines):
# here you get a sequence of dicts
# do whatever you want with each line here
print(row)
You can compact this a bit in actual code, but I tried to keep it step-by-step to show the object hierarchy with boto3.
Edit Per your comment about avoiding reading the entire file into memory: I haven't run into that requirement so cant speak authoritatively, but I would try wrapping the stream so I could get a text file-like iterator. For example you could use the codecs library to replace the csv parsing section above with something like:
for row in csv.DictReader(codecs.getreader('utf-8')(response[u'Body'])):
print(row)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With