it looks like sorting numpy structured and record arrays by a single column is much slower than doing a sort on a similar standalone array: <pre class="prettyprint"><code>In [111]: a = np.random.rand(1e4) In [112]: b = np.random.rand(1e4) In [113]: rec = np.rec.fromarrays([a,b]) In [114]: timeit rec.argsort(order='f0') 100 loops, best of 3: 18.8 ms per loop In [115]: timeit a.argsort() 1000 loops, best of 3: 891 µs per loop </code></pre> There is a marginal improvement using the structured array, but it's not dramatic: <pre class="prettyprint"><code>In [120]: struct = np.empty(len(a),dtype=[('a','f8'),('b','f8')]) In [121]: struct['a'] = a In [122]: struct['b'] = b In [124]: timeit struct.argsort(order='a') 100 loops, best of 3: 15.8 ms per loop </code></pre> This indicates that it's potentially faster to create an index array from argsort and then use that to reorder the individual arrays. This is OK except that I expect to be dealing with very large arrays and would like to avoid copying data as much as possible. Is there a more efficient way of doing this that I'm missing?

What´s slowing you is the use of <code>order</code>, not the fact that you have a record array. If you want to sort by a single field, do it like this: <pre class="prettyprint"><code>In [12]: %timeit np.argsort(rec['f0']) 1000 loops, best of 3: 829 us per loop </code></pre> Once <code>order</code> is used, performance goes south no matter how many fields you want to sort by: <pre class="prettyprint"><code>In [16]: %timeit np.argsort(rec, order=['f0']) 10 loops, best of 3: 27.9 ms per loop In [17]: %timeit np.argsort(rec, order=['f0', 'f1']) 10 loops, best of 3: 28.4 ms per loop </code></pre>

As Jaime have said, you can use <code>argsort</code> to sort the record array. <pre class="prettyprint"><code>inds = np.argsort(rec['f0']) </code></pre> And use <code>take</code> to avoid making a copy <pre class="prettyprint"><code>np.take(rec, inds, out=rec) </code></pre>

sorting numpy structured and record arrays is very slow

Tags:

python

arrays

sorting

numpy

it looks like sorting numpy structured and record arrays by a single column is much slower than doing a sort on a similar standalone array:

Click to copy

In [111]: a = np.random.rand(1e4)

In [112]: b = np.random.rand(1e4)

In [113]: rec = np.rec.fromarrays([a,b])

In [114]: timeit rec.argsort(order='f0')
100 loops, best of 3: 18.8 ms per loop

In [115]: timeit a.argsort()
1000 loops, best of 3: 891 µs per loop

There is a marginal improvement using the structured array, but it's not dramatic:

Click to copy

In [120]: struct = np.empty(len(a),dtype=[('a','f8'),('b','f8')])

In [121]: struct['a'] = a

In [122]: struct['b'] = b

In [124]: timeit struct.argsort(order='a')
100 loops, best of 3: 15.8 ms per loop

This indicates that it's potentially faster to create an index array from argsort and then use that to reorder the individual arrays. This is OK except that I expect to be dealing with very large arrays and would like to avoid copying data as much as possible. Is there a more efficient way of doing this that I'm missing?

956

asked Oct 30 '13 12:10

Rok

2 Answers

What´s slowing you is the use of order, not the fact that you have a record array. If you want to sort by a single field, do it like this:

Click to copy

In [12]: %timeit np.argsort(rec['f0'])
1000 loops, best of 3: 829 us per loop

Once order is used, performance goes south no matter how many fields you want to sort by:

Click to copy

In [16]: %timeit np.argsort(rec, order=['f0'])
10 loops, best of 3: 27.9 ms per loop

In [17]: %timeit np.argsort(rec, order=['f0', 'f1'])
10 loops, best of 3: 28.4 ms per loop

183

answered Sep 27 '22 17:09

Jaime

As Jaime have said, you can use argsort to sort the record array.

Click to copy

inds = np.argsort(rec['f0'])

And use take to avoid making a copy

Click to copy

np.take(rec, inds, out=rec)

answered Sep 27 '22 18:09

imsc

Related questions
                            
                                Adding custom response Headers to APIException
                            
                                logical operators replacing if statements
                            
                                strptime defaulting to 1900
                            
                                Can I use "from __future__ import unicode_literals" in a master import file?
                            
                                Python and gpu OpenCV functions
                            
                                where does this specific log output from django runserver come from
                            
                                How do I keep a shared data structure in-memory between Django Requests
                            
                                [scikit learn]: Anomaly Detection - Alternative for OneClassSVM
                            
                                python generator of generators?
                            
                                Python benchmark tool like nosetests?
                            
                                What's the difference betwee "Actor model" and "Reactor pattern" in Python?
                            
                                Pandas: Pivoting with multi-index data
                            
                                short form for string.format(...,**locals())
                            
                                If I have a reference to a bound method in Python, will that alone keep the object alive?
                            
                                How to calculate the algorithmic complexity of Python functions? [duplicate]
                            
                                Why doesn't coverage.py properly measure Flask's runserver command?
                            
                                pip install pycuda on windows
                            
                                Reshape 4d numpy array to 2d array while preserving array locations
                            
                                electric-pair-mode and Python triple quotes
                            
                                Using Flask with apscheduler

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

sorting numpy structured and record arrays is very slow

Tags:

python

arrays

sorting

numpy

Rok

People also ask

2 Answers

Jaime

imsc

Recent Activity

Donate For Us