Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cannot load file containing pickled data - Python .npy I/O

Tags:

python

io

numpy

I am trying to save a dataframe and a matrix as .npy files with np.save() and then read them using np.load() but I get the following error:

  File "/Users/sofiafarina/opt/anaconda3/lib/python3.7/site-packages/numpy/lib/npyio.py", line 457, in load
    raise ValueError("Cannot load file containing pickled data "

ValueError: Cannot load file containing pickled data when allow_pickle=False

Even if I write allow_pickle=True I get an error:

  File "/Users/sofiafarina/opt/anaconda3/lib/python3.7/site-packages/numpy/lib/npyio.py", line 463, in load
    "Failed to interpret file %s as a pickle" % repr(file))

OSError: Failed to interpret file 'finaldf_p_85_12.npy' as a pickle

So how could I save a df from a python script and then load it in another one? Should I use other functions? Thank you!

like image 898
sofiafarina Avatar asked Feb 12 '20 15:02

sofiafarina


4 Answers

I used the syntax below to load the .npy file and it worked.

np.load("finaldf_p_85_12.npy",allow_pickle=True)

I think you need to add allow_pickle=True parameter.

like image 77
Ajeet Avatar answered Nov 18 '22 02:11

Ajeet


TLDR;

After hundreds of search and hours of debugging I found out that the issue was with git-lfs, my files did not get pulled using git-lfs.

git lfs install
git lfs pull

I think numpy needs to report this correctly


I had the exact same issue. dtype in my .npz file was uint8, so not an Object, technically allow_pickle should not be required. My numpy version is 1.20.x

Got the following when using allow_pickle=False

ValueError: Cannot load file containing pickled data when allow_pickle=False

And with allow_pickle=True I got

OSError: Failed to interpret file 'finaldf_p_85_12.npy' as a pickle

like image 9
Anshul Lodha Avatar answered Nov 18 '22 03:11

Anshul Lodha


Python uses a native data serialization module called Pickle. Nested data (like a list of lists) is serialized using pickle and NumPy warns against pickling.

Warning: Loading files that contain object arrays uses the pickle module, which is not secure against erroneous or maliciously constructed data. Consider passing allow_pickle=False to load data that is known not to contain object arrays for the safer handling of untrusted sources.

You might be saving an array which consists a single dataFrame. This causes pickling. Example:

x =  array([[ 0.1,  0.1,  0.1],
       [ 0.1,  0.1,  0.1],
       [ 0.1,  0.1,  0.1],
       [ 0.1,  0.1,  0.1],
       [ 0.1,  0.1,  0.1],
       [ 0.1,  0.1,  0.1],
       [ 0.1,  0.1,  0.1]])

In that case, try saving just the numpy array as np.save(filename, x[0]). This will not use any pickling to save your data and resolves the issue.

like image 2
deeksha_g Avatar answered Nov 18 '22 03:11

deeksha_g


The OSError suggests you could be having a python 2/python 3 issue. I had the same problem and errors when I was trying to read a file with python 3 that had been written in python 2. For me, using the np.load command with the following arguments worked:

np.load('file.npy',allow_pickle=True,fix_imports=True,encoding='latin1')

The doc for numpy.load says about the encoding argument, "Only useful when loading Python 2 generated pickled files in Python 3, which includes npy/npz files containing object arrays."

like image 1
Tom F Avatar answered Nov 18 '22 03:11

Tom F