import numpy as np
a=np.random.randint(0,200,100)#rand int array
b1=np.random.randint(0,100,50)
b2=b1**3
c=[]
I have a problem I think should be easy but can't find solution, I want to find the matching values in two arrays, then use the indices of one of these to find values in another array
for i in range(len(a)):
for j in range(len(b1)):
if b1[j]==a[i]:
c.append(b2[j])
c=np.asarray(c)
Clearly the above method does work, but it's very slow, and this is just an example, in the work I'm actually do a,b1,b2 are all over 10,000 elements.
Any faster solutions?
np.in1d(b1, a)
returns a boolean array indicating whether each element of b1
is found in a
.
If you wanted to get the values in b2
which corresponded to the indices of common values in a
and b1
, you could use the boolean array to index b2
:
b2[np.in1d(b1, a)]
Using this function should be a lot faster as the for
loops are pushed down to the level of NumPy's internal routines.
You can use numpy.intersect1d
to get the intersection between 1d arrays.Note that when you can find the intersection then you don't need the indices or use them to find themselves again!!!
>>> a=np.random.randint(0,200,100)
>>> b1=np.random.randint(0,100,50)
>>>
>>> np.intersect1d(b1,a)
array([ 3, 9, 17, 19, 22, 23, 37, 53, 55, 58, 67, 85, 93, 94])
You may note that using intersection
is a more efficient way as for a[np.in1d(a, b1)]
in addition of calling in1d
function python is forced to do an extra indexing,for better understanding see the following benchmark :
import numpy as np
s1="""
import numpy as np
a=np.random.randint(0,200,100)
b1=np.random.randint(0,100,50)
np.intersect1d(b1,a)
"""
s2="""
import numpy as np
a=np.random.randint(0,200,100)
b1=np.random.randint(0,100,50)
a[np.in1d(a, b1)]
"""
print ' first: ' ,timeit(stmt=s1, number=100000)
print 'second : ',timeit(stmt=s2, number=100000)
Result:
first: 3.69082999229
second : 7.77609300613
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With