I need to create a CSV and upload it to an S3 bucket. Since I'm creating the file on the fly, it would be better if I could write it directly to S3 bucket as it is being created rather than writing the whole file locally, and then uploading the file at the end.
Is there a way to do this? My project is in Python and I'm fairly new to the language. Here is what I tried so far:
import csv import csv import io import boto from boto.s3.key import Key conn = boto.connect_s3() bucket = conn.get_bucket('dev-vs') k = Key(bucket) k.key = 'foo/foobar' fieldnames = ['first_name', 'last_name'] writer = csv.DictWriter(io.StringIO(), fieldnames=fieldnames) k.set_contents_from_stream(writer.writeheader())
I received this error: BotoClientError: s3 does not support chunked transfer
UPDATE: I found a way to write directly to S3, but I can't find a way to clear the buffer without actually deleting the lines I already wrote. So, for example:
conn = boto.connect_s3() bucket = conn.get_bucket('dev-vs') k = Key(bucket) k.key = 'foo/foobar' testDict = [{ "fieldA": "8", "fieldB": None, "fieldC": "888888888888"}, { "fieldA": "9", "fieldB": None, "fieldC": "99999999999"}] f = io.StringIO() fieldnames = ['fieldA', 'fieldB', 'fieldC'] writer = csv.DictWriter(f, fieldnames=fieldnames) writer.writeheader() k.set_contents_from_string(f.getvalue()) for row in testDict: writer.writerow(row) k.set_contents_from_string(f.getvalue()) f.close()
Writes 3 lines to the file, however I'm unable to release memory to write a big file. If I add:
f.seek(0) f.truncate(0)
to the loop, then only the last line of the file is written. Is there any way to release resources without deleting lines from the file?
You can set up the Kinesis Stream to S3 to start streaming your data to Amazon S3 buckets using the following steps: Step 1: Signing in to the AWS Console for Amazon Kinesis. Step 2: Configuring the Delivery Stream. Step 3: Transforming Records using a Lambda Function.
There are three ways in which you can upload a file to amazon S3.
When you upload large files to Amazon S3, it's a best practice to leverage multipart uploads. If you're using the AWS Command Line Interface (AWS CLI), then all high-level aws s3 commands automatically perform a multipart upload when the object is large. These high-level commands include aws s3 cp and aws s3 sync.
You can upload any file type—images, backups, data, movies, etc. —into an S3 bucket. The maximum size of a file that you can upload by using the Amazon S3 console is 160 GB. To upload a file larger than 160 GB, use the AWS CLI, AWS SDK, or Amazon S3 REST API.
I did find a solution to my question, which I will post here in case anyone else is interested. I decided to do this as parts in a multipart upload. You can't stream to S3. There is also a package available that changes your streaming file over to a multipart upload which I used: Smart Open.
import smart_open import io import csv testDict = [{ "fieldA": "8", "fieldB": None, "fieldC": "888888888888"}, { "fieldA": "9", "fieldB": None, "fieldC": "99999999999"}] fieldnames = ['fieldA', 'fieldB', 'fieldC'] f = io.StringIO() with smart_open.smart_open('s3://dev-test/bar/foo.csv', 'wb') as fout: writer = csv.DictWriter(f, fieldnames=fieldnames) writer.writeheader() fout.write(f.getvalue()) for row in testDict: f.seek(0) f.truncate(0) writer.writerow(row) fout.write(f.getvalue()) f.close()
We were trying to upload file contents to s3 when it came through as an InMemoryUploadedFile object in a Django request. We ended up doing the following because we didn't want to save the file locally. Hope it helps:
@action(detail=False, methods=['post']) def upload_document(self, request): document = request.data.get('image').file s3.upload_fileobj(document, BUCKET_NAME, DESIRED_NAME_OF_FILE_IN_S3, ExtraArgs={"ServerSideEncryption": "aws:kms"})
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With