Yesterday I learned the hard way that saving a pandas dataframe to csv for later use is a bad idea. I have a dataframe of +- 130k tweets, where one row of the dataframe is a list of tweets. When I saved the data to CSV and then loaded the dataframe back in, the rows of my dataframes are now of type String. This lead to all kinds of errors and a lot of debugging. Of course it was a stupid mistake to assume that CSV would be able to preserve information about which data structure type my data is.
My question now is: How do I save a dataframe for later use, in a way that information about which data types my columns/rows are is preserved?
I hope you found the solution you were looking for.
To answer the question, one can use the DataFrame.to_pickle() method to serialize (convert python objects into byte streams), and when you de-serialize a pickle file, you get back the data as they were, but keep in mind when using pickle files, they may pose a security threat when received from untrusted sources.
Here's an example from the doc on how to use pickle:
>>> original_df = pd.DataFrame({"foo": range(5), "bar": range(5, 10)})
>>> original_df
foo bar
0 0 5
1 1 6
2 2 7
3 3 8
4 4 9
>>> pd.to_pickle(original_df, "./dummy.pkl")
>>> unpickled_df = pd.read_pickle("./dummy.pkl")
>>> unpickled_df
foo bar
0 0 5
1 1 6
2 2 7
3 3 8
4 4 9
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With