After many attempts trying optimize code, it seems that one last resource would be to attempt to run the code below using multiple cores. I don't know exactly how to convert/re-structure my code so that it can run much faster using multiple cores. I will appreciate if I could get guidance to achieve the end goal. The end goal is to be able to run this code as fast as possible for arrays A and B where each array holds about 700,000 elements. Here is the code using small arrays. The 700k element arrays are commented out.
import numpy as np
def ismember(a,b):
for i in a:
index = np.where(b==i)[0]
if index.size == 0:
yield 0
else:
yield index
def f(A, gen_obj):
my_array = np.arange(len(A))
for i in my_array:
my_array[i] = gen_obj.next()
return my_array
#A = np.arange(700000)
#B = np.arange(700000)
A = np.array([3,4,4,3,6])
B = np.array([2,5,2,6,3])
gen_obj = ismember(A,B)
f(A, gen_obj)
print 'done'
# if we print f(A, gen_obj) the output will be: [4 0 0 4 3]
# notice that the output array needs to be kept the same size as array A.
What I am trying to do is to mimic a MATLAB function called ismember[2] (The one that is formatted as: [Lia,Locb] = ismember(A,B)
. I am just trying to get the Locb
part only.
From Matlab: Locb, contain the lowest index in B for each value in A that is a member of B. The output array, Locb, contains 0 wherever A is not a member of B
One of the main problems is that I need to be able to perform this operation as efficient as possible. For testing I have two arrays of 700k elements. Creating a generator and going through the values of the generator doesn't seem to get the job done fast.
Before worrying about multiple cores, I would eliminate the linear scan in your ismember function by using a dictionary:
def ismember(a, b):
bind = {}
for i, elt in enumerate(b):
if elt not in bind:
bind[elt] = i
return [bind.get(itm, None) for itm in a] # None can be replaced by any other "not in b" value
Your original implementation requires a full scan of the elements in B for each element in A, making it O(len(A)*len(B))
. The above code requires one full scan of B to generate the dict Bset. By using a dict, you effectively make the lookup of each element in B constant for each element of A, making the operation O(len(A)+len(B))
. If this is still too slow, then worry about making the above function run on multiple cores.
Edit: I've also modified your indexing slightly. Matlab uses 0 because all of its arrays start at index 1. Python/numpy start arrays at 0, so if you're data set looks like this
A = [2378, 2378, 2378, 2378]
B = [2378, 2379]
and you return 0 for no element, then your results will exclude all elements of A. The above routine returns None
for no index instead of 0. Returning -1 is an option, but Python will interpret that to be the last element in the array. None
will raise an exception if it's used as an index into the array. If you'd like different behavior, change the second argument in the Bind.get(item,None)
expression to the value you want returned.
sfstewman's excellent answer most likely solved the issue for you.
I'd just like to add how you can achieve the same exclusively in numpy.
I make use of numpy's unique an in1d functions.
B_unique_sorted, B_idx = np.unique(B, return_index=True)
B_in_A_bool = np.in1d(B_unique_sorted, A, assume_unique=True)
B_unique_sorted
contains the unique values in B
sorted.B_idx
holds for these values the indices into the original B
.B_in_A_bool
is a boolean array the size of B_unique_sorted
that
stores whether a value in B_unique_sorted
is in A
.B_idx
A
is already unique.Now you can use B_in_A_bool
to either get the common vals
B_unique_sorted[B_in_A_bool]
and their respective indices in the original B
B_idx[B_in_A_bool]
Finally, I assume that this is significantly faster than the pure Python for-loop although I didn't test it.
Try the ismember
library.
pip install ismember
Simple example:
# Import library
from ismember import ismember
import numpy as np
# data
A = np.array([3,4,4,3,6])
B = np.array([2,5,2,6,3])
# Lookup
Iloc,idx = ismember(A, B)
# Iloc is boolean defining existence of d in d_unique
print(Iloc)
# [ True False False True True]
# indexes of d_unique that exists in d
print(idx)
# [4 4 3]
print(B[idx])
# [3 3 6]
print(A[Iloc])
# [3 3 6]
# These vectors will match
A[Iloc]==B[idx]
Speed check:
from ismember import ismember
from datetime import datetime
t1=[]
t2=[]
# Create some random vectors
ns = np.random.randint(10,10000,1000)
for n in ns:
a_vec = np.random.randint(0,100,n)
b_vec = np.random.randint(0,100,n)
# Run stack version
start = datetime.now()
out1=ismember_stack(a_vec, b_vec)
end = datetime.now()
t1.append(end - start)
# Run ismember
start = datetime.now()
out2=ismember(a_vec, b_vec)
end = datetime.now()
t2.append(end - start)
print(np.sum(t1))
# 0:00:07.778331
print(np.sum(t2))
# 0:00:04.609801
# %%
def ismember_stack(a, b):
bind = {}
for i, elt in enumerate(b):
if elt not in bind:
bind[elt] = i
return [bind.get(itm, None) for itm in a] # None can be replaced by any other "not in b" value
The ismember
function from pypi is almost 2x faster.
Large vectors, eg 700000 elements:
from ismember import ismember
from datetime import datetime
A = np.random.randint(0,100,700000)
B = np.random.randint(0,100,700000)
# Lookup
start = datetime.now()
Iloc,idx = ismember(A, B)
end = datetime.now()
# Print time
print(end-start)
# 0:00:01.194801
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With