I am trying to pickle a DataFrame with
import pandas as pd
from pandas import DataFrame
data = pd.read_table('Purchases.tsv',index_col='coreuserid')
data.to_pickle('Purchases.pkl')
I have been running on "data" for a while and have had no issues so I know it is not a data corruption issue. I am thinking likely syntax but I have tried a number of variants. I hesitate to give the whole error message but it ends with:
\pickle.pyc in to_pickle(obj, path)
13 """
14 with open(path, 'wb') as f:
15 pkl.dump(obj, f, protocol=pkl.HIGHEST_PROTOCOL)
SystemError: error return without exception set
The Purchases.pkl file is created but if I call
data = pd.read_pickle('Purchases.pkl')
I get EOFError. I am using Canopy 1.4 so pandas 0.13.1 which should be recent enough to have this functionality.
DataFrame - to_pickle() function The to_pickle() function is used to pickle (serialize) object to file. File path where the pickled object will be stored. A string representing the compression to use in the output file. By default, infers from the file extension in specified path.
Pickle is a serialized way of storing a Pandas dataframe. Basically, you are writing down the exact representation of the dataframe to disk. This means the types of the columns are and the indices are the same. If you simply save a file as csv , you are just storing it as a comma separated list.
Pickle in Python is primarily used in serializing and deserializing a Python object structure. In other words, it's the process of converting a Python object into a byte stream to store it in a file/database, maintain program state across sessions, or transport data over the network.
Fast forward a few years, and now it works fine. Thanks pandas ;)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With