Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Serialization of a pandas DataFrame

Tags:

Is there a fast way to do serialization of a DataFrame?

I have a grid system which can run pandas analysis in parallel. In the end, I want to collect all the results (as a DataFrame) from each grid job and aggregate them into a giant DataFrame.

How can I save data frame in a binary format that can be loaded rapidly?

like image 541
James Bond Avatar asked Jun 06 '13 20:06

James Bond


People also ask

Is Panda DataFrame serializable?

Several excellent serialization options exist, each with different strengths. A combination of good serialization support for numeric data and Pandas categorical dtypes enable efficient serialization and storage of DataFrames.

How do I write a pandas DataFrame to a JSON file?

To convert the object to a JSON string, then use the Pandas DataFrame. to_json() function. Pandas to_json() is an inbuilt DataFrame function that converts the object to a JSON string. To export pandas DataFrame to a JSON file, then use the to_json() function.


1 Answers

The easiest way is just to use to_pickle (as a pickle), see pickling from the docs api page:

df.to_pickle(file_name) 

Another option is to use HDF5, slightly more work to get started but much richer for querying.

like image 159
Andy Hayden Avatar answered Sep 19 '22 21:09

Andy Hayden