I am attempting to read a .csv file uploaded to Django into a DataFrame.
I am following the instructions and the Django REST Framework page for uploading files. When I PUT
a .csv file to a defined endpoint I end up with a Django UploadedFile object, in particular, a TemporaryUploadedFile
.
I am trying to read this object into a pandas Dataframe using read_csv
, however, there is additional formatting around the temporary uploaded file. I am wondering how to read the original .csv file that was uploaded.
According to the DRF docs, I have assigned:
file_obj = request.data['file']
Inside of a Python debugging console, I see:
ipdb> file_obj
<TemporaryUploadedFile: foobar.csv (multipart/form-data; boundary=--------------------------044608164241682586561733)>
Things I've tried so far.
With the original file path, I can read it into pandas like this.
dataframe = pd.read_csv(open("foobar.csv", "rb"))
However, the original file has additional metadata added by Django during the upload process.
ipdb> pd.read_csv(open(file_obj.temporary_file_path(), "rb"))
*** pandas.errors.ParserError: Error tokenizing data. C error: Expected 1 fields in line 5, saw 32
If I try to use the UploadedFile.read()
method, I run into the following issue.
ipdb> dataframe = pd.read_csv(file_obj.read())
*** OSError: Expected file path name or file-like object, got <class 'bytes'> type
Thanks!
P.S. The first few lines of the original file look like this.
SPID,SA_ID,UOM,DIR,DATE,RS,NAICS,APCT,1:00,2:00,3:00,4:00,5:00,6:00,7:00,8:00,9:00,10:00,11:00,12:00,13:00,14:00,15:00,16:00,17:00,18:00,19:00,20:00,21:00,22:00,23:00,0:00:00
(Blanked),123456789,KWH,R,5/2/18,H2ETOUAN,,100,0,0,0,0,0,0,0,0.144,1.064,3.07,4.531,4.013,5.205,4.751,4.647,3.142,2.464,1.173,0.023,0,0,0,0,0
(Blanked),123456789,KWH,R,3/10/18,H2ETOUAN,,100,0,0,0,0,0,0,0,0,0.007,0.622,0.179,0.003,0.274,0.167,0.014,0.004,0.028,0.139,0,0,0,0,0,0
When I look at the contents of the temporary file, I see this.
----------------------------789873173211443224653494
Content-Disposition: form-data; name="file"; filename="foobar.csv"
Content-Type: File
SPID,SA_ID,UOM,DIR,DATE,RS,NAICS,APCT,1:00,2:00,3:00,4:00,5:00,6:00,7:00,8:00,9:00,10:00,11:00,12:00,13:00,14:00,15:00,16:00,17:00,18:00,19:00,20:00,21:00,22:00,23:00,0:00:00
(Blanked),123456789,KWH,R,5/2/18,H2ETOUAN,,100,0,0,0,0,0,0,0,0.144,1.064,3.07,4.531,4.013,5.205,4.751,4.647,3.142,2.464,1.173,0.023,0,0,0,0,0
(Blanked),123456789,KWH,R,3/10/18,H2ETOUAN,,100,0,0,0,0,0,0,0,0,0.007,0.622,0.179,0.003,0.274,0.167,0.014,0.004,0.028,0.139,0,0,0,0,0,0
You convert the CSV file to a DataFrame in three steps: (1) import the pandas library, (2) use pd. read_csv() and pass the filename as a string argument, and (3) print the resulting DataFrame (optional).
In this tutorial, you will learn how to use pandas in Django data. And convert a query set of data into a Data frame. Like how you convert a CSV data file into a Data Frame. And perform the data science operation right away in Django Views.
class TemporaryUploadedFile [source] A file uploaded to a temporary location (i.e. stream-to-disk). This class is used by the TemporaryFileUploadHandler .
UploadedFile.read()
returns the file data in bytes, not a file path or file-like object. In order to use pandas read_csv()
function, you'll need to turn those bytes into a stream. Since your file is a csv, the most straightforward way would be to use bytes.decode()
with io.StringIO()
, like:
dataframe = pd.read_csv(io.StringIO(file_obj.read().decode('utf-8')), delimiter=',')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With