Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pickle dump Pandas DataFrame

This is a question from a lazy man.

I have 4 million rows of pandas DataFrame and would like to save them into smaller chunks of pickle files.

Why smaller chunks? To save/load them quicker.

My question is: 1) Is there a better way (in-built function) to save them in smaller pieces than manually chunking them using np.array_split?

2) Is there any graceful way of gluing them together when I read the chunks other than manually gluing them together?

Please Feel free to suggest any other data type suited for this job other than pickle.

like image 683
aerin Avatar asked May 26 '26 09:05

aerin


1 Answers

If the goal is to save and load quickly you should look into using sql rather than raw text pickling. If your computer chokes when you ask it to write 4 million rows you can specify a chunk size.

From there you can query slices with std. SQL.

like image 95
kpie Avatar answered May 27 '26 22:05

kpie