<code>NumPy</code> arrays are great for both performance and easy use (easier slicing, indexing than lists). I try to construct a data container out of a <code>NumPy structured array</code> instead of <code>dict</code> of <code>NumPy arrays</code>. The problem is the performance is much worse. About 2.5 times as bad using homogeneous data and about 32 times for heterogeneous data (I'm talking about <code>NumPy</code> datatypes). Is there a way to speed the structured array's up? I tried changing the memoryorder from 'c' to 'f' but this didn't have any affect. Here's my profiling code: <pre class="prettyprint"><code>import time import numpy as np NP_SIZE = 100000 N_REP = 100 np_homo = np.zeros(NP_SIZE, dtype=[('a', np.double), ('b', np.double)], order='c') np_hetro = np.zeros(NP_SIZE, dtype=[('a', np.double), ('b', np.int32)], order='c') dict_homo = {'a': np.zeros(NP_SIZE), 'b': np.zeros(NP_SIZE)} dict_hetro = {'a': np.zeros(NP_SIZE), 'b': np.zeros(NP_SIZE, np.int32)} t0 = time.time() for i in range(N_REP): np_homo['a'] += i t1 = time.time() for i in range(N_REP): np_hetro['a'] += i t2 = time.time() for i in range(N_REP): dict_homo['a'] += i t3 = time.time() for i in range(N_REP): dict_hetro['a'] += i t4 = time.time() print('Homogeneous Numpy struct array took {:.4f}s'.format(t1 - t0)) print('Hetoregeneous Numpy struct array took {:.4f}s'.format(t2 - t1)) print('Homogeneous Dict of numpy arrays took {:.4f}s'.format(t3 - t2)) print('Hetoregeneous Dict of numpy arrays took {:.4f}s'.format(t4 - t3)) </code></pre> Edit: Forgot to put my timing numbers: <pre class="prettyprint"><code>Homogenious Numpy struct array took 0.0101s Hetoregenious Numpy struct array took 0.1367s Homogenious Dict of numpy arrays took 0.0042s Hetoregenious Dict of numpy arrays took 0.0042s </code></pre> Edit2: I added some additional test case with the timit module: <pre class="prettyprint"><code>import numpy as np import timeit NP_SIZE = 1000000 def time(data, txt, n_rep=1000): def intern(): data['a'] += 1 time = timeit.timeit(intern, number=n_rep) print('{} {:.4f}'.format(txt, time)) np_homo = np.zeros(NP_SIZE, dtype=[('a', np.double), ('b', np.double)], order='c') np_hetro = np.zeros(NP_SIZE, dtype=[('a', np.double), ('b', np.int32)], order='c') dict_homo = {'a': np.zeros(NP_SIZE), 'b': np.zeros(NP_SIZE)} dict_hetro = {'a': np.zeros(NP_SIZE), 'b': np.zeros(NP_SIZE, np.int32)} time(np_homo, 'Homogeneous Numpy struct array') time(np_hetro, 'Hetoregeneous Numpy struct array') time(dict_homo, 'Homogeneous Dict of numpy arrays') time(dict_hetro, 'Hetoregeneous Dict of numpy arrays') </code></pre> results in: <pre class="prettyprint"><code>Homogeneous Numpy struct array 0.7989 Hetoregeneous Numpy struct array 13.5253 Homogeneous Dict of numpy arrays 0.3750 Hetoregeneous Dict of numpy arrays 0.3744 </code></pre> The ratios between the runs seem reasonably stable. Using both methods and a different size of the array. For the offcase it matters: python: 3.4 NumPy: 1.9.2

In my quick timing tests the difference isn't that large: <pre class="prettyprint"><code>In [717]: dict_homo = {'a': np.zeros(10000), 'b': np.zeros(10000)} In [718]: timeit dict_homo['a']+=1 10000 loops, best of 3: 25.9 µs per loop In [719]: np_homo = np.zeros(10000, dtype=[('a', np.double), ('b', np.double)]) In [720]: timeit np_homo['a'] += 1 10000 loops, best of 3: 29.3 µs per loop </code></pre> In the <code>dict_homo</code> case, the fact that the array is embedded in a dictionary is a minor point. Simple dictionary access like this is fast, basically the same as accessing the array by variable name. So the first case it basically a test of <code>+=</code> for a 1d array. In the structured case, the <code>a</code> and <code>b</code> values alternate in the data buffer, so <code>np_homo['a']</code> is a view that 'pulls out' alternative numbers. So it's not surprising that it would be a bit slower. <pre class="prettyprint"><code>In [721]: np_homo Out[721]: array([(41111.0, 0.0), (41111.0, 0.0), (41111.0, 0.0), ..., (41111.0, 0.0), (41111.0, 0.0), (41111.0, 0.0)], dtype=[('a', '<f8'), ('b', '<f8')]) </code></pre> A 2d array also interleaves the column values. <pre class="prettyprint"><code>In [722]: np_twod=np.zeros((10000,2), np.double) In [723]: timeit np_twod[:,0]+=1 10000 loops, best of 3: 36.8 µs per loop </code></pre> Surprisingly it's actually a bit slower than the structured case. Using <code>order='F'</code> or (2,10000) shape speeds it up a bit, but still not quite as good as the structured case. These are small test times, so I won't make grand claims. But the structured array doesn't look back. <hr> Another time tests, initializing the array or dictionary fresh each step <pre class="prettyprint"><code>In [730]: %%timeit np.twod=np.zeros((10000,2), np.double) np.twod[:,0] += 1 .....: 10000 loops, best of 3: 36.7 µs per loop In [731]: %%timeit np_homo = np.zeros(10000, dtype=[('a', np.double), ('b', np.double)]) np_homo['a'] += 1 .....: 10000 loops, best of 3: 38.3 µs per loop In [732]: %%timeit dict_homo = {'a': np.zeros(10000), 'b': np.zeros(10000)} dict_homo['a'] += 1 .....: 10000 loops, best of 3: 25.4 µs per loop </code></pre> 2d and structured are closer, with somewhat better performance for the dictionary (1d) case. I tried this with <code>np.ones</code> as well, since <code>np.zeros</code> can have delayed allocation, but no difference in behavior.

Speed up structured NumPy array

Tags:

performance

python-3.x

numpy

NumPy arrays are great for both performance and easy use (easier slicing, indexing than lists).

I try to construct a data container out of a NumPy structured array instead of dict of NumPy arrays. The problem is the performance is much worse. About 2.5 times as bad using homogeneous data and about 32 times for heterogeneous data (I'm talking about NumPy datatypes).

Is there a way to speed the structured array's up? I tried changing the memoryorder from 'c' to 'f' but this didn't have any affect.

Here's my profiling code:

import time
import numpy as np

NP_SIZE = 100000
N_REP = 100

np_homo = np.zeros(NP_SIZE, dtype=[('a', np.double), ('b', np.double)], order='c')
np_hetro = np.zeros(NP_SIZE, dtype=[('a', np.double), ('b', np.int32)], order='c')
dict_homo = {'a': np.zeros(NP_SIZE), 'b': np.zeros(NP_SIZE)}
dict_hetro = {'a': np.zeros(NP_SIZE), 'b': np.zeros(NP_SIZE, np.int32)}

t0 = time.time()
for i in range(N_REP):
    np_homo['a'] += i

t1 = time.time()
for i in range(N_REP):
    np_hetro['a'] += i

t2 = time.time()
for i in range(N_REP):
    dict_homo['a'] += i

t3 = time.time()
for i in range(N_REP):
    dict_hetro['a'] += i
t4 = time.time()

print('Homogeneous Numpy struct array took {:.4f}s'.format(t1 - t0))
print('Hetoregeneous Numpy struct array took {:.4f}s'.format(t2 - t1))
print('Homogeneous Dict of numpy arrays took {:.4f}s'.format(t3 - t2))
print('Hetoregeneous Dict of numpy arrays took {:.4f}s'.format(t4 - t3))

Edit: Forgot to put my timing numbers:

Homogenious Numpy struct array took 0.0101s
Hetoregenious Numpy struct array took 0.1367s
Homogenious Dict of numpy arrays took 0.0042s
Hetoregenious Dict of numpy arrays took 0.0042s

Edit2: I added some additional test case with the timit module:

import numpy as np
import timeit

NP_SIZE = 1000000

def time(data, txt, n_rep=1000):
    def intern():
        data['a'] += 1

    time = timeit.timeit(intern, number=n_rep)
    print('{} {:.4f}'.format(txt, time))


np_homo = np.zeros(NP_SIZE, dtype=[('a', np.double), ('b', np.double)], order='c')
np_hetro = np.zeros(NP_SIZE, dtype=[('a', np.double), ('b', np.int32)], order='c')
dict_homo = {'a': np.zeros(NP_SIZE), 'b': np.zeros(NP_SIZE)}
dict_hetro = {'a': np.zeros(NP_SIZE), 'b': np.zeros(NP_SIZE, np.int32)}

time(np_homo, 'Homogeneous Numpy struct array')
time(np_hetro, 'Hetoregeneous Numpy struct array')
time(dict_homo, 'Homogeneous Dict of numpy arrays')
time(dict_hetro, 'Hetoregeneous Dict of numpy arrays')

results in:

Homogeneous Numpy struct array 0.7989
Hetoregeneous Numpy struct array 13.5253
Homogeneous Dict of numpy arrays 0.3750
Hetoregeneous Dict of numpy arrays 0.3744

The ratios between the runs seem reasonably stable. Using both methods and a different size of the array.

For the offcase it matters: python: 3.4 NumPy: 1.9.2

819

asked Jan 21 '16 18:01

magu_

1 Answers

In my quick timing tests the difference isn't that large:

In [717]: dict_homo = {'a': np.zeros(10000), 'b': np.zeros(10000)}
In [718]: timeit dict_homo['a']+=1
10000 loops, best of 3: 25.9 µs per loop
In [719]: np_homo = np.zeros(10000, dtype=[('a', np.double), ('b', np.double)])
In [720]: timeit np_homo['a'] += 1
10000 loops, best of 3: 29.3 µs per loop

In the dict_homo case, the fact that the array is embedded in a dictionary is a minor point. Simple dictionary access like this is fast, basically the same as accessing the array by variable name.

So the first case it basically a test of += for a 1d array.

In the structured case, the a and b values alternate in the data buffer, so np_homo['a'] is a view that 'pulls out' alternative numbers. So it's not surprising that it would be a bit slower.

In [721]: np_homo
Out[721]: 
array([(41111.0, 0.0), (41111.0, 0.0), (41111.0, 0.0), ..., (41111.0, 0.0),
       (41111.0, 0.0), (41111.0, 0.0)], 
      dtype=[('a', '<f8'), ('b', '<f8')])

A 2d array also interleaves the column values.

In [722]: np_twod=np.zeros((10000,2), np.double)
In [723]: timeit np_twod[:,0]+=1
10000 loops, best of 3: 36.8 µs per loop

Surprisingly it's actually a bit slower than the structured case. Using order='F' or (2,10000) shape speeds it up a bit, but still not quite as good as the structured case.

These are small test times, so I won't make grand claims. But the structured array doesn't look back.

Another time tests, initializing the array or dictionary fresh each step

In [730]: %%timeit np.twod=np.zeros((10000,2), np.double)
np.twod[:,0] += 1
   .....: 
10000 loops, best of 3: 36.7 µs per loop
In [731]: %%timeit np_homo = np.zeros(10000, dtype=[('a', np.double), ('b', np.double)])
np_homo['a'] += 1
   .....: 
10000 loops, best of 3: 38.3 µs per loop
In [732]: %%timeit dict_homo = {'a': np.zeros(10000), 'b': np.zeros(10000)}
dict_homo['a'] += 1
   .....: 
10000 loops, best of 3: 25.4 µs per loop

2d and structured are closer, with somewhat better performance for the dictionary (1d) case. I tried this with np.ones as well, since np.zeros can have delayed allocation, but no difference in behavior.

161

answered Sep 18 '22 15:09

hpaulj

Related questions
                            
                                can it be solved in linear time, did this in n^2 time
                            
                                Fast line drawing in OpenGL
                            
                                Why is it impossible to Applicative-traverse arrays? (Or is it?)
                            
                                Why do we need telegraf when using statsd
                            
                                Displaying KML Layers on Maps at Native Android apps Best Practice
                            
                                Perf startup overhead: Why does a simple static executable which performs MOV + SYS_exit have so many stalled cycles (and instructions)?
                            
                                PDFKit - PDFView using pageViewController - page rendering slow when swiping to next page
                            
                                Speeding Up Excel Data to Pandas
                            
                                Sort algorithm for Excel / SharedStrings
                            
                                Fastest way to asynchronously execute a method?
                            
                                Is SQLite suitable for use as a read only cache on a web server?
                            
                                Configure Bullet Physics for speed
                            
                                Scala Futures are slow with many cores
                            
                                Performance effect of Synonyms over a linked server in SQL Server
                            
                                CROSS APPLY with table valued function restriction performance
                            
                                Is Linq to Objects chaining where clause VS && performance hit is that insignificant?
                            
                                Array.Sort() performance drop when sorting class instances instead of floats
                            
                                MongoDB scans entire index when using $all and $elemMatch
                            
                                Swift Compiler performance
                            
                                iOS Metal compute pipeline slower than CPU implementation for search task

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With