Numpy Array: Efficiently find matching indices

Tags:

I have two lists, one of which is massive (millions of elements), the other several thousand. I want to do the following

bigArray=[0,1,0,2,3,2,,.....]

smallArray=[0,1,2,3,4]

for i in len(smallArray):
  pts=np.where(bigArray==smallArray[i])
  #Do stuff with pts...

The above works, but is slow. Is there any way to do this more efficiently without resorting to writing something in C?

906

asked Apr 25 '12 17:04

user1356855

1 Answers

In your case you may benefit from presorting your big array. Here is the example demonstrating how you can reduce the time from ~ 45 seconds to 2 seconds (on my laptop)(for one particular set of lengths of the arrays 5e6 vs 1e3). Obviously the solution won't be optimal if the array sizes will be wastly different. E.g. with the default solution the complexity is O(bigN*smallN), but for my suggested solution it is O((bigN+smallN)*log(bigN))

import numpy as np, numpy.random as nprand, time, bisect

bigN = 5e6
smallN = 1000
maxn = 1e7
nprand.seed(1)  
bigArr = nprand.randint(0, maxn, size=bigN)
smallArr = nprand.randint(0, maxn, size=smallN)

# brute force 
t1 = time.time()
for i in range(len(smallArr)):
    inds = np.where(bigArr == smallArr[i])[0]
t2 = time.time()
print "Brute", t2-t1

# not brute force (like nested loop with index scan)
t1 = time.time()
sortedind = np.argsort(bigArr)
sortedbigArr = bigArr[sortedind]
for i in range(len(smallArr)):
    i1 = bisect.bisect_left(sortedbigArr, smallArr[i])
    i2 = bisect.bisect_right(sortedbigArr, smallArr[i])
    inds = sortedind[i1:i2]
t2=time.time()
print "Non-brute", t2-t1

Output:

Brute 42.5278530121

Non-brute 1.57193303108

answered Sep 30 '22 13:09

sega_sai

Related questions
                            
                                Unable to use wx.NotificationMessage properly with wxPython
                            
                                How to test equivalence of ranges
                            
                                What is a more succinct way of converting python boolean to javascript boolean literals?
                            
                                Getting the subsets of a set in Python
                            
                                Connect to putty and type few command
                            
                                Content of infobox of Wikipedia
                            
                                Using argparse in conjunction with sys.argv in Python
                            
                                Shortest Repeating Sub-String
                            
                                urllib2 with cookies
                            
                                del MyClass doesn't call object.__del__()
                            
                                If I want to bubble up a generic exception, what do i do in python?
                            
                                Pretty printers for maps throwing a type error
                            
                                How to Sort 2 Element Tuple of Strings in Mixed Order Using key Parameter (Not cmp)
                            
                                Using map function with a multi-variable function
                            
                                Django cannot find my media files (on development server)
                            
                                argument 1 has unexpected type 'Ui_mainWindow'
                            
                                Filter by id for multiple data in Django
                            
                                Can I access specific key values in dictionary from django template?
                            
                                Can modules have properties? [duplicate]
                            
                                why 'in' operator with tuple as a key in python so slow?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Numpy Array: Efficiently find matching indices

Tags:

python

numpy

scipy

user1356855

People also ask

1 Answers

sega_sai

Recent Activity

Donate For Us