I am currently trying to write a dataframe to a temp file and then upload that temp file into an S3 bucket. When I run my code there currently isn't any action that occurs. Any help would be greatly appreciated. The following is my code:
import csv
import pandas as pd
import boto3
import tempfile
import os
s3 = boto3.client('s3', aws_access_key_id = access_key, aws_secret_access_key = secret_key, region_name = region)
temp = tempfile.TemporaryFile()
largedf.to_csv(temp, sep = '|')
s3.put_object(temp, Bucket = '[BUCKET NAME]', Key = 'test.txt')
temp.close()
The file-handle you pass to the s3.put_object
is at the final position, when you .read
from it, it will return an empty string.
>>> df = pd.DataFrame(np.random.randint(10,50, (5,5)))
>>> temp = tempfile.TemporaryFile(mode='w+')
>>> df.to_csv(temp)
>>> temp.read()
''
A quick fix is to .seek
back to the beginning...
>>> temp.seek(0)
0
>>> print(temp.read())
,0,1,2,3,4
0,11,42,40,45,11
1,36,18,45,24,25
2,28,20,12,33,44
3,45,39,14,16,20
4,40,16,22,30,37
Note, writing to disk is unnecessary, really, you could just keep everything in memory using a buffer, something like:
from io import StringIO # on python 2, use from cStringIO import StringIO
buffer = StringIO()
# Saving df to memory as a temporary file
df.to_csv(buffer)
buffer.seek(0)
s3.put_object(buffer, Bucket = '[BUCKET NAME]', Key = 'test.txt')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With