I've got a MacBook (Mac OS X 10.9) with 16 Gb of RAM. Two Pythons installed via Anaconda: 2.7.8 and 3.4.1. Both equipped with the latest scikit-learn 0.15.1. While trying to run that simple code (just testing the possibility to serialize large matrixes):
import numpy as np
test_data = np.random.rand(10000, 60000)
print(test_data.nbytes / 2**30)
from sklearn.externals import joblib
joblib.dump(test_data, '/Users/va/Desktop/test_data.joblib')
Python 2.7.8 is doing well, but Python 3.4.1 stuck with the following error:
Failed to save <class 'numpy.ndarray'> to .npy file:
Traceback (most recent call last):
File "/Users/va/anaconda/python.app/Contents/lib/python3.4/site-
packages/sklearn/externals/joblib/numpy_pickle.py", line 240, in save
obj, filename = self._write_array(obj, filename)
File "/Users/va/anaconda/python.app/Contents/lib/python3.4/site-
packages/sklearn/externals/joblib/numpy_pickle.py", line 203, in _write_array
self.np.save(filename, array)
File "/Users/va/anaconda/python.app/Contents/lib/python3.4/site-
packages/numpy/lib/npyio.py", line 453, in save
format.write_array(fid, arr)
File "/Users/va/anaconda/python.app/Contents/lib/python3.4/site-
packages/numpy/lib/format.py", line 410, in write_array
fp.write(array.tostring('C'))
OSError: [Errno 22] Invalid argument
Traceback (most recent call last):
File "<ipython-input-3-90ed09e5c6d4>", line 1, in <module>
joblib.dump(test_data, '/Users/va/Desktop/test_data.joblib')
File "/Users/va/anaconda/python.app/Contents/lib/python3.4/site-
packages/sklearn/externals/joblib/numpy_pickle.py", line 368, in dump
pickler.dump(value)
File "/Users/va/anaconda/python.app/Contents/lib/python3.4/pickle.py", line 412, in dump
self.framer.end_framing()
File "/Users/va/anaconda/python.app/Contents/lib/python3.4/pickle.py", line 196, in end_framing
self.commit_frame(force=True)
File "/Users/va/anaconda/python.app/Contents/lib/python3.4/pickle.py", line 208, in commit_frame
write(data)
OSError: [Errno 22] Invalid argument
It appears the problem is in the amount of data to be stored. E.g., Python 3 handles np.random.rand(10000, 20000), which is 1.5 Gb, perfectly well.
Just in case, pickle didn't work as well:
import pickle
with open('/Users/va/Desktop/test_data.pkl', 'wb') as f:
pickle.dump(test_data, f, protocol=pickle.HIGHEST_PROTOCOL)
goes to:
Traceback (most recent call last):
File "<ipython-input-6-3f73f3011539>", line 3, in <module>
pickle.dump(test_data, f, protocol=pickle.HIGHEST_PROTOCOL)
OSError: [Errno 22] Invalid argument
On Windows 7 Python 3.4 works fine with both joblib
and pickle
.
Any suggestions how to solve that problem with Python 3 on Mac?
This happens to me on OS X 10.10 with Python 3.4.3 using pickle
too
Instead I started using https://github.com/zopefoundation/zodbpickle, which is around 2-3 times slower, but definitely works with sklearn
classifiers
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With