Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

NumPy: consequences of using 'np.save()' with 'allow_pickle=False'

Tags:

python

numpy

According to NumPy documentation here, by default, a matrix is saved with allow_pickle=True, and furthermore, they tell what could be problematic with this default behavior:

allow_pickle : bool, optional
Allow saving object arrays using Python pickles. Reasons for disallowing pickles include security (loading pickled data can execute arbitrary code) and portability (pickled objects may not be loadable on different Python installations, for example if the stored objects require libraries that are not available, and not all pickled data is compatible between Python 2 and Python 3).
Default: True

After reading it, I would of course prefer to use allow_pickle=False - but they do not tell what is different when it is used this way. There must be some reason they use allow_pickel=True by default despite its disadvantages.

Could you please tell whether you use allow_pickle=False and how it behaves differently?

like image 466
SalatYerakot Avatar asked Jan 17 '17 11:01

SalatYerakot


People also ask

Does NP save overwrite?

When numpy arrays are multidimensional, they need to be two-dimensional to be saved. Save cannot be appended, that is, every time np. savetxt() overwrites the previous content.

What does Allow_pickle true do?

If allow_pickle=True , but the file cannot be loaded as a pickle. The file contains an object array, but allow_pickle=False given.

Can not load file containing pickled data when Allow_pickle false?

Consider passing allow_pickle=False to load data that is known not to contain object arrays for the safer handling of untrusted sources. In that case, try saving just the numpy array as np. save(filename, x[0]) . This will not use any pickling to save your data and resolves the issue.

Where does NP save save to?

save() numpy. save() function is used to store the input array in a disk file with npy extension(. npy).


1 Answers

An object array is just a normal numpy array where the dtype is object; this happens if the contents of the array aren't of the normal numerical types (like int or float, etc.). We can try out saving a numpy array with objects, just to test how this works. A simple kind of object would be a dict:

>>> import numpy as np
>>> a = np.array([{x: 1} for x in range(4)])
>>> a
array([{0: 1}, {1: 1}, {2: 1}, {3: 1}], dtype=object)
>>> np.save('test.pkl', a)

Loading this back works fine:

>>> np.load('test.pkl.npy')
array([{0: 1}, {1: 1}, {2: 1}, {3: 1}], dtype=object)

The array can't be saved without using pickle, though:

>>> np.save('test.pkl', a, allow_pickle=False)
...
ValueError: Object arrays cannot be saved when allow_pickle=False

The rule of thumb for pickles is that you're safe if you're loading a pickle that you made, but you should be careful about loading pickles that you got from somewhere else. For one thing, if you don't have the same libraries (or library versions) installed that were used to make the pickle, you might not be able to load the pickle (this is what's meant by portability above). Security is another potential concern; you can read a bit about how pickles can be abused in this article, for instance.

like image 84
wildwilhelm Avatar answered Oct 25 '22 22:10

wildwilhelm