Can you upload to S3 using a stream rather than a local file?

Tags:

I need to create a CSV and upload it to an S3 bucket. Since I'm creating the file on the fly, it would be better if I could write it directly to S3 bucket as it is being created rather than writing the whole file locally, and then uploading the file at the end.

Is there a way to do this? My project is in Python and I'm fairly new to the language. Here is what I tried so far:

import csv import csv import io import boto from boto.s3.key import Key   conn = boto.connect_s3() bucket = conn.get_bucket('dev-vs') k = Key(bucket) k.key = 'foo/foobar'  fieldnames = ['first_name', 'last_name'] writer = csv.DictWriter(io.StringIO(), fieldnames=fieldnames) k.set_contents_from_stream(writer.writeheader())

I received this error: BotoClientError: s3 does not support chunked transfer

UPDATE: I found a way to write directly to S3, but I can't find a way to clear the buffer without actually deleting the lines I already wrote. So, for example:

conn = boto.connect_s3() bucket = conn.get_bucket('dev-vs') k = Key(bucket) k.key = 'foo/foobar'  testDict = [{     "fieldA": "8",     "fieldB": None,     "fieldC": "888888888888"},     {     "fieldA": "9",     "fieldB": None,     "fieldC": "99999999999"}]  f = io.StringIO() fieldnames = ['fieldA', 'fieldB', 'fieldC'] writer = csv.DictWriter(f, fieldnames=fieldnames) writer.writeheader() k.set_contents_from_string(f.getvalue())  for row in testDict:     writer.writerow(row)     k.set_contents_from_string(f.getvalue())  f.close()

Writes 3 lines to the file, however I'm unable to release memory to write a big file. If I add:

f.seek(0) f.truncate(0)

to the loop, then only the last line of the file is written. Is there any way to release resources without deleting lines from the file?

364

asked Jun 24 '15 16:06

inquiring minds

2 Answers

I did find a solution to my question, which I will post here in case anyone else is interested. I decided to do this as parts in a multipart upload. You can't stream to S3. There is also a package available that changes your streaming file over to a multipart upload which I used: Smart Open.

import smart_open import io import csv  testDict = [{     "fieldA": "8",     "fieldB": None,     "fieldC": "888888888888"},     {     "fieldA": "9",     "fieldB": None,     "fieldC": "99999999999"}]  fieldnames = ['fieldA', 'fieldB', 'fieldC'] f = io.StringIO() with smart_open.smart_open('s3://dev-test/bar/foo.csv', 'wb') as fout:     writer = csv.DictWriter(f, fieldnames=fieldnames)     writer.writeheader()     fout.write(f.getvalue())      for row in testDict:         f.seek(0)         f.truncate(0)         writer.writerow(row)         fout.write(f.getvalue())  f.close()

174

answered Sep 30 '22 06:09

inquiring minds

We were trying to upload file contents to s3 when it came through as an InMemoryUploadedFile object in a Django request. We ended up doing the following because we didn't want to save the file locally. Hope it helps:

@action(detail=False, methods=['post']) def upload_document(self, request):      document = request.data.get('image').file      s3.upload_fileobj(document, BUCKET_NAME,                                   DESIRED_NAME_OF_FILE_IN_S3,                                   ExtraArgs={"ServerSideEncryption": "aws:kms"})

answered Sep 30 '22 04:09

Sean Saúl Astrakhan

Related questions
                            
                                Python (pip) - RequestsDependencyWarning: urllib3 (1.9.1) or chardet (2.3.0) doesn't match a supported version
                            
                                xls to csv converter
                            
                                How to limit a number to be within a specified range? (Python)
                            
                                Clear all widgets in a layout in pyqt
                            
                                Generate unique id in django from a model field
                            
                                ImportError: No module named mysql.connector using Python2
                            
                                Enforcing python version in setup.py
                            
                                Efficient calculation of Fibonacci series
                            
                                Nose unable to find tests in ubuntu
                            
                                Running a test suite with over a million test cases
                            
                                Error packaging Kivy with numpy library for Android using buildozer
                            
                                gensim Doc2Vec vs tensorflow Doc2Vec
                            
                                Best practices for turning jupyter notebooks into python scripts
                            
                                Why is string's startswith slower than in?
                            
                                When does socket.recv(recv_size) return?
                            
                                ResourceWarning unclosed socket in Python 3 Unit Test
                            
                                How to know if urllib.urlretrieve succeeds?
                            
                                How to make urllib2 requests through Tor in Python?
                            
                                How to speed up pytest
                            
                                Should all member variables be initialized in __init__

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Can you upload to S3 using a stream rather than a local file?

Tags:

python

csv

amazon-s3

boto

buffering

inquiring minds

People also ask

2 Answers

inquiring minds

Sean Saúl Astrakhan

Recent Activity

Donate For Us