Here an example:
import numpy as np
randoms = np.random.randint(0, 20, 10000000)
a = randoms.astype(np.int)
b = randoms.astype(np.object)
np.save('d:/dtype=int.npy', a) #39 mb
np.save('d:/dtype=object.npy', b) #19 mb!
You can see that the file with dtype=object is about half the size. How come? I was under the impression that properly defined numpy dtypes are strictly better than object dtypes.
A data type object (an instance of numpy. dtype class) describes how the bytes in the fixed-size block of memory corresponding to an array item should be interpreted. It describes the following aspects of the data: Type of the data (integer, float, Python object, etc.)
1. NumPy uses much less memory to store data. The NumPy arrays takes significantly less amount of memory as compared to python lists. It also provides a mechanism of specifying the data types of the contents, which allows further optimisation of the code.
there is no converting the dimensions of a numpy array in python. A numpy array is simply a section of your RAM. You can't append to it in the sense of literally adding bytes to the end of the array, but you can create another array and copy over all the data (which is what np. append(), or np.
This shows some performance numbers of operations between Python and Numpy. Notice how the 2nd set of numbers (NumPy) are always smaller - meaning they have much better performance than their Python List core library conterparts.
With a non-object dtype, most of the npy file format consists of a dump of the raw bytes of the array's data. That'd be either 4 or 8 bytes per element here, depending on whether your NumPy defaults to 4- or 8-byte integers. From the file size, it looks like 4 bytes per element.
With an object dtype, most of the npy file format consists of an ordinary pickle of the array. For small integers, such as those in your array, the pickle uses the K
pickle opcode, long name BININT1
, "documented" in the pickletools
module:
I(name='BININT1',
code='K',
arg=uint1,
stack_before=[],
stack_after=[pyint],
proto=1,
doc="""Push a one-byte unsigned integer.
This is a space optimization for pickling very small non-negative ints,
in range(256).
"""),
This requires two bytes per integer, one for the K
opcode and one byte of unsigned integer data.
Note that you could have cut down the file size even further by storing your array with dtype numpy.int8
or numpy.uint8
, for roughly 1 byte per integer.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With