Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can you upload to S3 using a stream rather than a local file?

I need to create a CSV and upload it to an S3 bucket. Since I'm creating the file on the fly, it would be better if I could write it directly to S3 bucket as it is being created rather than writing the whole file locally, and then uploading the file at the end.

Is there a way to do this? My project is in Python and I'm fairly new to the language. Here is what I tried so far:

import csv import csv import io import boto from boto.s3.key import Key   conn = boto.connect_s3() bucket = conn.get_bucket('dev-vs') k = Key(bucket) k.key = 'foo/foobar'  fieldnames = ['first_name', 'last_name'] writer = csv.DictWriter(io.StringIO(), fieldnames=fieldnames) k.set_contents_from_stream(writer.writeheader()) 

I received this error: BotoClientError: s3 does not support chunked transfer

UPDATE: I found a way to write directly to S3, but I can't find a way to clear the buffer without actually deleting the lines I already wrote. So, for example:

conn = boto.connect_s3() bucket = conn.get_bucket('dev-vs') k = Key(bucket) k.key = 'foo/foobar'  testDict = [{     "fieldA": "8",     "fieldB": None,     "fieldC": "888888888888"},     {     "fieldA": "9",     "fieldB": None,     "fieldC": "99999999999"}]  f = io.StringIO() fieldnames = ['fieldA', 'fieldB', 'fieldC'] writer = csv.DictWriter(f, fieldnames=fieldnames) writer.writeheader() k.set_contents_from_string(f.getvalue())  for row in testDict:     writer.writerow(row)     k.set_contents_from_string(f.getvalue())  f.close() 

Writes 3 lines to the file, however I'm unable to release memory to write a big file. If I add:

f.seek(0) f.truncate(0) 

to the loop, then only the last line of the file is written. Is there any way to release resources without deleting lines from the file?

like image 364
inquiring minds Avatar asked Jun 24 '15 16:06

inquiring minds


People also ask

Can we stream data to S3?

You can set up the Kinesis Stream to S3 to start streaming your data to Amazon S3 buckets using the following steps: Step 1: Signing in to the AWS Console for Amazon Kinesis. Step 2: Configuring the Delivery Stream. Step 3: Transforming Records using a Lambda Function.

How many ways you can upload data to S3?

There are three ways in which you can upload a file to amazon S3.

What is the best way for the application to upload the large files in S3?

When you upload large files to Amazon S3, it's a best practice to leverage multipart uploads. If you're using the AWS Command Line Interface (AWS CLI), then all high-level aws s3 commands automatically perform a multipart upload when the object is large. These high-level commands include aws s3 cp and aws s3 sync.

What type of files can be uploaded to S3 bucket?

You can upload any file type—images, backups, data, movies, etc. —into an S3 bucket. The maximum size of a file that you can upload by using the Amazon S3 console is 160 GB. To upload a file larger than 160 GB, use the AWS CLI, AWS SDK, or Amazon S3 REST API.


2 Answers

I did find a solution to my question, which I will post here in case anyone else is interested. I decided to do this as parts in a multipart upload. You can't stream to S3. There is also a package available that changes your streaming file over to a multipart upload which I used: Smart Open.

import smart_open import io import csv  testDict = [{     "fieldA": "8",     "fieldB": None,     "fieldC": "888888888888"},     {     "fieldA": "9",     "fieldB": None,     "fieldC": "99999999999"}]  fieldnames = ['fieldA', 'fieldB', 'fieldC'] f = io.StringIO() with smart_open.smart_open('s3://dev-test/bar/foo.csv', 'wb') as fout:     writer = csv.DictWriter(f, fieldnames=fieldnames)     writer.writeheader()     fout.write(f.getvalue())      for row in testDict:         f.seek(0)         f.truncate(0)         writer.writerow(row)         fout.write(f.getvalue())  f.close() 
like image 174
inquiring minds Avatar answered Sep 30 '22 06:09

inquiring minds


We were trying to upload file contents to s3 when it came through as an InMemoryUploadedFile object in a Django request. We ended up doing the following because we didn't want to save the file locally. Hope it helps:

@action(detail=False, methods=['post']) def upload_document(self, request):      document = request.data.get('image').file      s3.upload_fileobj(document, BUCKET_NAME,                                   DESIRED_NAME_OF_FILE_IN_S3,                                   ExtraArgs={"ServerSideEncryption": "aws:kms"}) 
like image 21
Sean Saúl Astrakhan Avatar answered Sep 30 '22 04:09

Sean Saúl Astrakhan