I have a dataframe "DF" with with 500,000 rows. Here are the data types per column: <pre class="prettyprint"><code>ID int64 time datetime64[ns] data object </code></pre> each entry in the "data" column is an array with size = [5,500] When I try to save this dataframe using <pre class="prettyprint"><code>DF.to_pickle("my_filename.pkl") </code></pre> it returned me the following error: <pre class="prettyprint"><code> 12 """ 13 with open(path, 'wb') as f: ---> 14 pkl.dump(obj, f, protocol=pkl.HIGHEST_PROTOCOL) OSError: [Errno 22] Invalid argument </code></pre> I also try this method but I get the same error: <pre class="prettyprint"><code>import pickle with open('my_filename.pkl', 'wb') as f: pickle.dump(DF, f) </code></pre> I try to save 10 rows of this dataframe: <pre class="prettyprint"><code>DF.head(10).to_pickle('test_save.pkl') </code></pre> and I have no error at all. Therefore, it can save small DF but not large DF. I am using python 3, ipython notebook 3 in Mac. Please help me to solve this problem. I really need to save this DF to a pickle file. I can not find the solution in the internet.

Probably not the answer you were hoping for but this is what I did...... Split the dataframe into smaller chunks using np.array_split (although numpy functions are not guaranteed to work, it does now, although there used to be a bug for it). Then pickle the smaller dataframes. When you unpickle them use pandas.append or pandas.concat to glue everything back together. I agree it is a fudge and suboptimal. If anyone can suggest a "proper" answer I'd be interested in seeing it, but I think it as simple as dataframes are not supposed to get above a certain size. Split a large pandas dataframe

Python Pandas to_pickle cannot pickle large dataframes

Tags:

python

pandas

pickle

I have a dataframe "DF" with with 500,000 rows. Here are the data types per column:

ID      int64
time    datetime64[ns]
data    object

each entry in the "data" column is an array with size = [5,500]

When I try to save this dataframe using

DF.to_pickle("my_filename.pkl")

it returned me the following error:

     12     """
     13     with open(path, 'wb') as f:
---> 14         pkl.dump(obj, f, protocol=pkl.HIGHEST_PROTOCOL) 

OSError: [Errno 22] Invalid argument

I also try this method but I get the same error:

import pickle


with open('my_filename.pkl', 'wb') as f:
    pickle.dump(DF, f)

I try to save 10 rows of this dataframe:

DF.head(10).to_pickle('test_save.pkl')

and I have no error at all. Therefore, it can save small DF but not large DF.

I am using python 3, ipython notebook 3 in Mac.

Please help me to solve this problem. I really need to save this DF to a pickle file. I can not find the solution in the internet.

383

asked Apr 09 '15 19:04

Joseph Roxas

2 Answers

Until there is a fix somewhere on pickle/pandas side of things, I'd say a better option is to use alternative IO backend. HDF is suitable for large datasets (GBs). So you don't need to add additional split/combine logic.

df.to_hdf('my_filename.hdf','mydata',mode='w')

df = pd.read_hdf('my_filename.hdf','mydata')

186

answered Sep 18 '22 09:09

volodymyr

Probably not the answer you were hoping for but this is what I did......

Split the dataframe into smaller chunks using np.array_split (although numpy functions are not guaranteed to work, it does now, although there used to be a bug for it).

Then pickle the smaller dataframes.

When you unpickle them use pandas.append or pandas.concat to glue everything back together.

I agree it is a fudge and suboptimal. If anyone can suggest a "proper" answer I'd be interested in seeing it, but I think it as simple as dataframes are not supposed to get above a certain size.

Split a large pandas dataframe

answered Sep 22 '22 09:09

Yupsiree

Related questions
                            
                                How to tell uWSGI to prefer processes to threads for load balancing
                            
                                Static method returning an instance of its class
                            
                                How to union two subqueries in SQLAlchemy and postgresql
                            
                                Numpy Ceil and Floor "out" Argument
                            
                                Subtract a Series from a DataFrame while keeping the DataFrame struct intact
                            
                                is there any way to prevent side effects in python?
                            
                                Flask hello world using apache and mod_wsgi shows files in webroot only
                            
                                python matplotlib: unable to call FuncAnimation from inside a function
                            
                                Numpy array integer / float division [duplicate]
                            
                                Set color for xticklabels individually in matplotlib
                            
                                how to apply a function to multiple columns in a pandas dataframe at one time
                            
                                Running median of y-values over a range of x
                            
                                How to export graph in RDF file using RDFLib
                            
                                Yum not working? [closed]
                            
                                What is the purpose of Py_DECREF and PY_INCREF?
                            
                                How can I package a coroutine as normal function in event loop?
                            
                                Redirect management.call_command() stdout to a file
                            
                                Column Order in Pandas Groupby Agg Function
                            
                                3D convex hull from point cloud
                            
                                How can I add a foreign key constraint on an existing table column via SQLAlchemy?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With