I am using the arrays modules to store sizable numbers (many gigabytes) of unsigned 32 bit ints. Rather than using 4 bytes for each element, python is using 8 bytes, as indicated by array.itemsize, and verified by pympler.
eg:
>>> array("L", range(10)).itemsize
8
I have a large number of elements, so I would benefit from storing them within 4 bytes.
Numpy will let me store the values as unsigned 32 bit ints:
>>> np.array(range(10), dtype = np.uint32).itemsize
4
But the problem is that any operation using numpy's index operator is about twice as slow, so operations that aren't vector operations supported by numpy are slow. eg:
python3 -m timeit -s "from array import array; a = array('L', range(1000))" "for i in range(len(a)): a[i]"
10000 loops, best of 3: 51.4 usec per loop
vs
python3 -m timeit -s "import numpy as np; a = np.array(range(1000), dtype = np.uint32)" "for i in range(len(a)): a[i]"
10000 loops, best of 3: 90.4 usec per loop
So I am forced to either use twice as much memory as I would like, or the program will run twice as slow as I would like. Is there a way around this? Can I force python arrays to have use specified itemsize?
If you want to stick to using array
, set the typecode to I
(unsigned int
) rather than L
(unsigned long
):
>>> array.array("I", range(10)).itemsize
4
That said, I would be very surprised if there wasn't a way to speed up your calculations way more than the 2x you are losing by using numpy. Hard to tell without knowing exactly what are you doing with those values.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With