NumPy
arrays are great for both performance and easy use (easier slicing, indexing than lists).
I try to construct a data container out of a NumPy structured array
instead of dict
of NumPy arrays
. The problem is the performance is much worse. About 2.5 times as bad using homogeneous data and about 32 times for heterogeneous data (I'm talking about NumPy
datatypes).
Is there a way to speed the structured array's up? I tried changing the memoryorder from 'c' to 'f' but this didn't have any affect.
Here's my profiling code:
import time
import numpy as np
NP_SIZE = 100000
N_REP = 100
np_homo = np.zeros(NP_SIZE, dtype=[('a', np.double), ('b', np.double)], order='c')
np_hetro = np.zeros(NP_SIZE, dtype=[('a', np.double), ('b', np.int32)], order='c')
dict_homo = {'a': np.zeros(NP_SIZE), 'b': np.zeros(NP_SIZE)}
dict_hetro = {'a': np.zeros(NP_SIZE), 'b': np.zeros(NP_SIZE, np.int32)}
t0 = time.time()
for i in range(N_REP):
np_homo['a'] += i
t1 = time.time()
for i in range(N_REP):
np_hetro['a'] += i
t2 = time.time()
for i in range(N_REP):
dict_homo['a'] += i
t3 = time.time()
for i in range(N_REP):
dict_hetro['a'] += i
t4 = time.time()
print('Homogeneous Numpy struct array took {:.4f}s'.format(t1 - t0))
print('Hetoregeneous Numpy struct array took {:.4f}s'.format(t2 - t1))
print('Homogeneous Dict of numpy arrays took {:.4f}s'.format(t3 - t2))
print('Hetoregeneous Dict of numpy arrays took {:.4f}s'.format(t4 - t3))
Edit: Forgot to put my timing numbers:
Homogenious Numpy struct array took 0.0101s
Hetoregenious Numpy struct array took 0.1367s
Homogenious Dict of numpy arrays took 0.0042s
Hetoregenious Dict of numpy arrays took 0.0042s
Edit2: I added some additional test case with the timit module:
import numpy as np
import timeit
NP_SIZE = 1000000
def time(data, txt, n_rep=1000):
def intern():
data['a'] += 1
time = timeit.timeit(intern, number=n_rep)
print('{} {:.4f}'.format(txt, time))
np_homo = np.zeros(NP_SIZE, dtype=[('a', np.double), ('b', np.double)], order='c')
np_hetro = np.zeros(NP_SIZE, dtype=[('a', np.double), ('b', np.int32)], order='c')
dict_homo = {'a': np.zeros(NP_SIZE), 'b': np.zeros(NP_SIZE)}
dict_hetro = {'a': np.zeros(NP_SIZE), 'b': np.zeros(NP_SIZE, np.int32)}
time(np_homo, 'Homogeneous Numpy struct array')
time(np_hetro, 'Hetoregeneous Numpy struct array')
time(dict_homo, 'Homogeneous Dict of numpy arrays')
time(dict_hetro, 'Hetoregeneous Dict of numpy arrays')
results in:
Homogeneous Numpy struct array 0.7989
Hetoregeneous Numpy struct array 13.5253
Homogeneous Dict of numpy arrays 0.3750
Hetoregeneous Dict of numpy arrays 0.3744
The ratios between the runs seem reasonably stable. Using both methods and a different size of the array.
For the offcase it matters: python: 3.4 NumPy: 1.9.2
By explicitly declaring the "ndarray" data type, your array processing can be 1250x faster. This tutorial will show you how to speed up the processing of NumPy arrays using Cython. By explicitly specifying the data types of variables in Python, Cython can give drastic speed increases at runtime.
NumPy Arrays are faster than Python Lists because of the following reasons: An array is a collection of homogeneous data-types that are stored in contiguous memory locations. On the other hand, a list in Python is a collection of heterogeneous data types stored in non-contiguous memory locations.
NumPy random for generating an array of random numbers 10 000 calls, and even though each call takes longer, you obtain a numpy. ndarray of 1000 random numbers. The reason why NumPy is fast when used right is that its arrays are extremely efficient. They are like C arrays instead of Python lists.
In my quick timing tests the difference isn't that large:
In [717]: dict_homo = {'a': np.zeros(10000), 'b': np.zeros(10000)}
In [718]: timeit dict_homo['a']+=1
10000 loops, best of 3: 25.9 µs per loop
In [719]: np_homo = np.zeros(10000, dtype=[('a', np.double), ('b', np.double)])
In [720]: timeit np_homo['a'] += 1
10000 loops, best of 3: 29.3 µs per loop
In the dict_homo
case, the fact that the array is embedded in a dictionary is a minor point. Simple dictionary access like this is fast, basically the same as accessing the array by variable name.
So the first case it basically a test of +=
for a 1d array.
In the structured case, the a
and b
values alternate in the data buffer, so np_homo['a']
is a view that 'pulls out' alternative numbers. So it's not surprising that it would be a bit slower.
In [721]: np_homo
Out[721]:
array([(41111.0, 0.0), (41111.0, 0.0), (41111.0, 0.0), ..., (41111.0, 0.0),
(41111.0, 0.0), (41111.0, 0.0)],
dtype=[('a', '<f8'), ('b', '<f8')])
A 2d array also interleaves the column values.
In [722]: np_twod=np.zeros((10000,2), np.double)
In [723]: timeit np_twod[:,0]+=1
10000 loops, best of 3: 36.8 µs per loop
Surprisingly it's actually a bit slower than the structured case. Using order='F'
or (2,10000) shape speeds it up a bit, but still not quite as good as the structured case.
These are small test times, so I won't make grand claims. But the structured array doesn't look back.
Another time tests, initializing the array or dictionary fresh each step
In [730]: %%timeit np.twod=np.zeros((10000,2), np.double)
np.twod[:,0] += 1
.....:
10000 loops, best of 3: 36.7 µs per loop
In [731]: %%timeit np_homo = np.zeros(10000, dtype=[('a', np.double), ('b', np.double)])
np_homo['a'] += 1
.....:
10000 loops, best of 3: 38.3 µs per loop
In [732]: %%timeit dict_homo = {'a': np.zeros(10000), 'b': np.zeros(10000)}
dict_homo['a'] += 1
.....:
10000 loops, best of 3: 25.4 µs per loop
2d and structured are closer, with somewhat better performance for the dictionary (1d) case. I tried this with np.ones
as well, since np.zeros
can have delayed allocation, but no difference in behavior.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With