Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can I force python array elements to have a specific size?

I am using the arrays modules to store sizable numbers (many gigabytes) of unsigned 32 bit ints. Rather than using 4 bytes for each element, python is using 8 bytes, as indicated by array.itemsize, and verified by pympler.

eg:

>>> array("L", range(10)).itemsize
8

I have a large number of elements, so I would benefit from storing them within 4 bytes.

Numpy will let me store the values as unsigned 32 bit ints:

>>> np.array(range(10), dtype = np.uint32).itemsize
4

But the problem is that any operation using numpy's index operator is about twice as slow, so operations that aren't vector operations supported by numpy are slow. eg:

python3 -m timeit -s "from array import array; a = array('L', range(1000))" "for i in range(len(a)): a[i]"
10000 loops, best of 3: 51.4 usec per loop

vs

python3 -m timeit -s "import numpy as np; a = np.array(range(1000), dtype = np.uint32)" "for i in range(len(a)): a[i]"
10000 loops, best of 3: 90.4 usec per loop

So I am forced to either use twice as much memory as I would like, or the program will run twice as slow as I would like. Is there a way around this? Can I force python arrays to have use specified itemsize?

like image 284
Paul Avatar asked Apr 25 '16 05:04

Paul


1 Answers

If you want to stick to using array, set the typecode to I (unsigned int) rather than L (unsigned long):

>>> array.array("I", range(10)).itemsize
4

That said, I would be very surprised if there wasn't a way to speed up your calculations way more than the 2x you are losing by using numpy. Hard to tell without knowing exactly what are you doing with those values.

like image 64
Jaime Avatar answered Oct 10 '22 14:10

Jaime