Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

CSV to Feather in Pandas with slicing Rows

I am processing a huge dataset (50 million rows) in CSV. I am trying to slice it and save it as Feather Format in order to save some memory while loading the feather format later.

As a workaround, I loaded the data in chunks as CSV file and later merged it into one data frame.

This is what I have tried so far:

df[2000000:4000000].to_feather('name')

I have got the following error:

ValueError: feather does not support serializing a non-default index for the index; you can .reset_index() to make the index into column(s)

Then I tried to reset the index but still, I get the same error.

like image 857
MKJ Avatar asked Sep 06 '18 19:09

MKJ


1 Answers

Try with .loc :

df.loc[2000000:4000000].reset_index().to_feather("./myfeather.ftr")

You'll have to reset the indexes to save the datataframe to feather format. Works for me.

like image 193
Lue Mar Avatar answered Oct 06 '22 15:10

Lue Mar