Is there any fast way to obtain unique elements in numpy? I have code similar to this (the last line) <pre class="prettyprint"><code>tab = numpy.arange(100000000) indices1 = numpy.random.permutation(10000) indices2 = indices1.copy() indices3 = indices1.copy() indices4 = indices1.copy() result = numpy.unique(numpy.array([tab[indices1], tab[indices2], tab[indices3], tab[indices4]])) </code></pre> This is just an example and in my situation <code>indices1, indices2,...,indices4</code> contains different set of indices and have various size. The last line is executed many times and Inoticed that it's actually the bottleneck in my code (<code>{numpy.core.multiarray.arange}</code> to be precesive). Besides, ordering is not important and element in indices array are of <code>int32</code> type. I was thinking about using hashtable with element value as key and tried: <pre class="prettyprint"><code>seq = itertools.chain(tab[indices1].flatten(), tab[indices2].flatten(), tab[indices3].flatten(), tab[indices4].flatten()) myset = {} map(myset.__setitem__, seq, []) result = numpy.array(myset.keys()) </code></pre> but it was even worse. Is there any way to speed this up? I guess the performance penalty comes from 'fancy indexing' that copy the array but I need the resulting element only to read (I don't modify anything).

Sorry I don't completely understand your question, but I'll do my best to help. Fist {numpy.core.multiarray.arange} is numpy.arange not fancy indexing, unfortunately fancy indexing does not show up as a separate line item in the profiler. If you're calling np.arange in the loop you, should see if you can move it outside. <pre class="prettyprint"><code>In [27]: prun tab[tab] 2 function calls in 1.551 CPU seconds Ordered by: internal time ncalls tottime percall cumtime percall filename:lineno(function) 1 1.551 1.551 1.551 1.551 <string>:1(<module>) 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects} In [28]: prun numpy.arange(10000000) 3 function calls in 0.051 CPU seconds Ordered by: internal time ncalls tottime percall cumtime percall filename:lineno(function) 1 0.047 0.047 0.047 0.047 {numpy.core.multiarray.arange} 1 0.003 0.003 0.051 0.051 <string>:1(<module>) 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects} </code></pre> Second I assume that <code>tab</code> is not <code>np.arange(a, b)</code> in your code, because if it is than <code>tab[index] == index + a</code>, but I assume that was just for your example. Third, np.concatenate is about 10 times faster than np.array <pre class="prettyprint"><code>In [47]: timeit numpy.array([tab[indices1], tab[indices2], tab[indices3], tab[indices4]]) 100 loops, best of 3: 5.11 ms per loop In [48]: timeit numpy.concatenate([tab[indices1], tab[indices2], tab[indices3], tab[indices4]]) 1000 loops, best of 3: 544 us per loop </code></pre> (Also np.concatenate gives a (4*n,) array and np.array gives a (4, n) array, where n is the length if indices[1-4]. The latter will only work if the indices1-4 are all the same length.) And last, you could also save even more time if you can do the following: <pre class="prettyprint"><code>indices = np.unique(np.concatenate((indices1, indices2, indices3, indices4))) result = tab[indices] </code></pre> Doing it in this order is faster because you reduce the number of indices you need to look up in tab, but it'll only work if you know that the elements of tab are unique (otherwise you could get repeats in result even if the indices are unique). Hope that helps

Fast duplicates removal in numpy and python

Tags:

performance

python

arrays

optimization

numpy

Is there any fast way to obtain unique elements in numpy? I have code similar to this (the last line)

tab = numpy.arange(100000000)

indices1 = numpy.random.permutation(10000)
indices2 = indices1.copy()
indices3 = indices1.copy()
indices4 = indices1.copy()

result = numpy.unique(numpy.array([tab[indices1], tab[indices2], tab[indices3], tab[indices4]]))

This is just an example and in my situation indices1, indices2,...,indices4 contains different set of indices and have various size. The last line is executed many times and Inoticed that it's actually the bottleneck in my code ({numpy.core.multiarray.arange} to be precesive). Besides, ordering is not important and element in indices array are of int32 type. I was thinking about using hashtable with element value as key and tried:

seq = itertools.chain(tab[indices1].flatten(), tab[indices2].flatten(), tab[indices3].flatten(), tab[indices4].flatten())
myset = {}
map(myset.__setitem__, seq, [])
result = numpy.array(myset.keys())

but it was even worse.

Is there any way to speed this up? I guess the performance penalty comes from 'fancy indexing' that copy the array but I need the resulting element only to read (I don't modify anything).

490

asked Dec 23 '11 20:12

pzo

2 Answers

[What follows is actually partially incorrect (see the PS):]

The following way of obtaining the unique elements in all sub-arrays is very fast:

seq = itertools.chain(tab[indices1].flat, tab[indices2].flat, tab[indices3].flat, tab[indices4].flat)
result = set(seq)

Note that flat (which returns an iterator) is used instead of flatten() (which returns a full array), and that set() can be called directly (instead of using map() and a dictionary, as in your second method).

Here are the timing results (obtained in the IPython shell):

>>> %timeit result = numpy.unique(numpy.array([tab[indices1], tab[indices2], tab[indices3], tab[indices4]]))
100 loops, best of 3: 8.04 ms per loop
>>> seq = itertools.chain(tab[indices1].flat, tab[indices2].flat, tab[indices3].flat, tab[indices4].flat)
>>> %timeit set(seq)
1000000 loops, best of 3: 223 ns per loop

The set/flat method is thus 40 times faster on this example.

PS: The timing of set(seq) is actually not representative. In fact, the first loop of the timing empties the seq iterator and the subsequent set() evaluations return an empty set! The correct timing test is the following

>>> %timeit set(itertools.chain(tab[indices1].flat, tab[indices2].flat, tab[indices3].flat, tab[indices4].flat))
100 loops, best of 3: 9.12 ms per loop

which shows that the set/flat method is actually not faster.

PPS: here is an (unsuccessful) exploration of mtrw's suggestion; finding the unique indices beforehand might be a good idea, but I can't find a way to implement it which is faster than the above approach:

>>> %timeit set(indices1).union(indices2).union(indices3).union(indices4)
100 loops, best of 3: 11.9 ms per loop
>>> %timeit set(itertools.chain(indices1.flat, indices2.flat, indices3.flat, indices4.flat))
100 loops, best of 3: 10.8 ms per loop

Thus, finding the set of all distinct indices is itself quite slow.

PPPS: numpy.unique(<concatenated array of indices>) is actually 2-3 times faster than set(<concatenated array of indices>). This is key to the speed up obtained in Bago's answer (unique(concatenate((…)))). The reason is probably that letting NumPy handle its arrays by itself is generally faster than interfacing pure Python (set) with NumPy arrays.

Conclusion: this answer therefore only documents failed attempts that should not be fully followed, as well as a possibly useful remark about timing code with iterators…

answered Oct 04 '22 11:10

Eric O Lebigot

Sorry I don't completely understand your question, but I'll do my best to help.

Fist {numpy.core.multiarray.arange} is numpy.arange not fancy indexing, unfortunately fancy indexing does not show up as a separate line item in the profiler. If you're calling np.arange in the loop you, should see if you can move it outside.

In [27]: prun tab[tab]
     2 function calls in 1.551 CPU seconds

Ordered by: internal time

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    1    1.551    1.551    1.551    1.551 <string>:1(<module>)
    1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler'    objects}

In [28]: prun numpy.arange(10000000)
     3 function calls in 0.051 CPU seconds

Ordered by: internal time

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    1    0.047    0.047    0.047    0.047 {numpy.core.multiarray.arange}
    1    0.003    0.003    0.051    0.051 <string>:1(<module>)
    1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

Second I assume that tab is not np.arange(a, b) in your code, because if it is than tab[index] == index + a, but I assume that was just for your example.

Third, np.concatenate is about 10 times faster than np.array

In [47]: timeit numpy.array([tab[indices1], tab[indices2], tab[indices3], tab[indices4]])
100 loops, best of 3: 5.11 ms per loop

In [48]: timeit numpy.concatenate([tab[indices1], tab[indices2], tab[indices3],     tab[indices4]])
1000 loops, best of 3: 544 us per loop

(Also np.concatenate gives a (4*n,) array and np.array gives a (4, n) array, where n is the length if indices[1-4]. The latter will only work if the indices1-4 are all the same length.)

And last, you could also save even more time if you can do the following:

indices = np.unique(np.concatenate((indices1, indices2, indices3, indices4)))
result = tab[indices]

Doing it in this order is faster because you reduce the number of indices you need to look up in tab, but it'll only work if you know that the elements of tab are unique (otherwise you could get repeats in result even if the indices are unique).

Hope that helps

answered Oct 04 '22 11:10

Bi Rico

Related questions
                            
                                Python download all files from internet address?
                            
                                Using the emacs bindings in pycharm in OS X Lion
                            
                                Individual point-sizing in Matplotlib?
                            
                                Optimal (broadcasted) matrix division in numpy. Avoiding temporary arrays or not?
                            
                                django doctests not being run
                            
                                what is python's road map to evolve [closed]
                            
                                is unicode( codecs.BOM_UTF8, "utf8" ) necessary in Python 2.7/3?
                            
                                How to use PyCharm in 32 bit mode
                            
                                What is correct method to execute curl from a subprocess?
                            
                                Python 2.7 32-bit install on Win 7: No registry keys?
                            
                                Keyboard input using stdio.StandardIO in twisted python
                            
                                No module named psycopg2
                            
                                Selenium Grid2 - Is it possible to run 10 Chrome instances?
                            
                                How to recursively loop through a file structure and rename directories in python
                            
                                Python scoping mutable vs immutable
                            
                                how to check if the urllib2 follow a redirect?
                            
                                What happens when import a module in ironpython?
                            
                                Python name of iterable that can be iterated many times
                            
                                Python - returning multiple values from function to different arrays
                            
                                How to automate the delegation of __special_methods__ in Python?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With