I have two 1D arrays, x & y, one smaller than the other. I'm trying to find the index of every element of y in x.
I've found two naive ways to do this, the first is slow, and the second memory-intensive.
indices= []
for iy in y:
indices += np.where(x==iy)[0][0]
xe = np.outer([1,]*len(x), y)
ye = np.outer(x, [1,]*len(y))
junk, indices = np.where(np.equal(xe, ye))
Is there a faster way or less memory intensive approach? Ideally the search would take advantage of the fact that we are searching for not one thing in a list, but many things, and thus is slightly more amenable to parallelization. Bonus points if you don't assume that every element of y is actually in x.
I want to suggest one-line solution:
indices = np.where(np.in1d(x, y))[0]
The result is an array with indices for x array which corresponds to elements from y which were found in x.
One can use it without numpy.where if needs.
As Joe Kington said, searchsorted() can search element very quickly. To deal with elements that are not in x, you can check the searched result with original y, and create a masked array:
import numpy as np
x = np.array([3,5,7,1,9,8,6,6])
y = np.array([2,1,5,10,100,6])
index = np.argsort(x)
sorted_x = x[index]
sorted_index = np.searchsorted(sorted_x, y)
yindex = np.take(index, sorted_index, mode="clip")
mask = x[yindex] != y
result = np.ma.array(yindex, mask=mask)
print result
the result is:
[-- 3 1 -- -- 6]
How about this?
It does assume that every element of y is in x, (and will return results even for elements that aren't!) but it is much faster.
import numpy as np
# Generate some example data...
x = np.arange(1000)
np.random.shuffle(x)
y = np.arange(100)
# Actually preform the operation...
xsorted = np.argsort(x)
ypos = np.searchsorted(x[xsorted], y)
indices = xsorted[ypos]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With