Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Read a Django UploadedFile into a pandas DataFrame

Tags:

pandas

django

I am attempting to read a .csv file uploaded to Django into a DataFrame.

I am following the instructions and the Django REST Framework page for uploading files. When I PUT a .csv file to a defined endpoint I end up with a Django UploadedFile object, in particular, a TemporaryUploadedFile.

I am trying to read this object into a pandas Dataframe using read_csv, however, there is additional formatting around the temporary uploaded file. I am wondering how to read the original .csv file that was uploaded.

According to the DRF docs, I have assigned:

file_obj = request.data['file']

Inside of a Python debugging console, I see:

ipdb> file_obj                                                                                                                                                                            
<TemporaryUploadedFile: foobar.csv (multipart/form-data; boundary=--------------------------044608164241682586561733)>

Things I've tried so far.

With the original file path, I can read it into pandas like this.

dataframe = pd.read_csv(open("foobar.csv", "rb"))

However, the original file has additional metadata added by Django during the upload process.

ipdb> pd.read_csv(open(file_obj.temporary_file_path(), "rb"))                                                                                                                             
*** pandas.errors.ParserError: Error tokenizing data. C error: Expected 1 fields in line 5, saw 32

If I try to use the UploadedFile.read() method, I run into the following issue.

ipdb> dataframe = pd.read_csv(file_obj.read())                                                                                                                                            
*** OSError: Expected file path name or file-like object, got <class 'bytes'> type

Thanks!

P.S. The first few lines of the original file look like this.

SPID,SA_ID,UOM,DIR,DATE,RS,NAICS,APCT,1:00,2:00,3:00,4:00,5:00,6:00,7:00,8:00,9:00,10:00,11:00,12:00,13:00,14:00,15:00,16:00,17:00,18:00,19:00,20:00,21:00,22:00,23:00,0:00:00
(Blanked),123456789,KWH,R,5/2/18,H2ETOUAN,,100,0,0,0,0,0,0,0,0.144,1.064,3.07,4.531,4.013,5.205,4.751,4.647,3.142,2.464,1.173,0.023,0,0,0,0,0
(Blanked),123456789,KWH,R,3/10/18,H2ETOUAN,,100,0,0,0,0,0,0,0,0,0.007,0.622,0.179,0.003,0.274,0.167,0.014,0.004,0.028,0.139,0,0,0,0,0,0

When I look at the contents of the temporary file, I see this.

----------------------------789873173211443224653494
Content-Disposition: form-data; name="file"; filename="foobar.csv"
Content-Type: File

SPID,SA_ID,UOM,DIR,DATE,RS,NAICS,APCT,1:00,2:00,3:00,4:00,5:00,6:00,7:00,8:00,9:00,10:00,11:00,12:00,13:00,14:00,15:00,16:00,17:00,18:00,19:00,20:00,21:00,22:00,23:00,0:00:00
(Blanked),123456789,KWH,R,5/2/18,H2ETOUAN,,100,0,0,0,0,0,0,0,0.144,1.064,3.07,4.531,4.013,5.205,4.751,4.647,3.142,2.464,1.173,0.023,0,0,0,0,0
(Blanked),123456789,KWH,R,3/10/18,H2ETOUAN,,100,0,0,0,0,0,0,0,0,0.007,0.622,0.179,0.003,0.274,0.167,0.014,0.004,0.028,0.139,0,0,0,0,0,0
like image 390
Sean Chon Avatar asked Dec 03 '19 18:12

Sean Chon


People also ask

Can CSV file be convert to DataFrame Python?

You convert the CSV file to a DataFrame in three steps: (1) import the pandas library, (2) use pd. read_csv() and pass the filename as a string argument, and (3) print the resulting DataFrame (optional).

Can pandas be used in Django?

In this tutorial, you will learn how to use pandas in Django data. And convert a query set of data into a Data frame. Like how you convert a CSV data file into a Data Frame. And perform the data science operation right away in Django Views.

What is TemporaryUploadedFile?

class TemporaryUploadedFile [source] A file uploaded to a temporary location (i.e. stream-to-disk). This class is used by the TemporaryFileUploadHandler .


1 Answers

UploadedFile.read() returns the file data in bytes, not a file path or file-like object. In order to use pandas read_csv() function, you'll need to turn those bytes into a stream. Since your file is a csv, the most straightforward way would be to use bytes.decode() with io.StringIO(), like:

dataframe = pd.read_csv(io.StringIO(file_obj.read().decode('utf-8')), delimiter=',')
like image 135
chemicollins Avatar answered Sep 20 '22 15:09

chemicollins