I have two arrays A
(len of 3.8million) and B
(len of 20k).
For the minimal example, lets take this case:
A = np.array([1,1,2,3,3,3,4,5,6,7,8,8])
B = np.array([1,2,8])
Now I want the resulting array to be:
C = np.array([3,3,3,4,5,6,7])
i.e. if any value in B
is found in A
, remove it from A
, if not keep it.
I would like to know if there is any way to do it without a for
loop because it is a lengthy array and so it takes long time to loop.
Removing Array Elements You can use the pop() method to remove an element from the array.
For removing one array from another array in java we will use the removeAll() method. This will remove all the elements of the array1 from array2 if we call removeAll() function from array2 and array1 as a parameter.
Deleting element from NumPy array using np. The delete(array_name ) method will be used to do the same. Where array_name is the name of the array to be deleted and index-value is the index of the element to be deleted.
searchsorted
With sorted B
, we can use searchsorted
-
A[B[np.searchsorted(B,A)] != A]
From the linked docs, searchsorted(a,v)
find the indices into a sorted array a
such that, if the corresponding elements in v
were inserted before the indices, the order of a would be preserved. So, let's say idx = searchsorted(B,A)
and we index into B
with those : B[idx]
, we will get a mapped version of B
corresponding to every element in A
. Thus, comparing this mapped version against A
would tell us for every element in A
if there's a match in B
or not. Finally, index into A
to select the non-matching ones.
Generic case (B
is not sorted) :
If B
is not already sorted as is the pre-requisite, sort it and then use the proposed method.
Alternatively, we can use sorter
argument with searchsorted
-
sidx = B.argsort()
out = A[B[sidx[np.searchsorted(B,A,sorter=sidx)]] != A]
More generic case (A
has values higher than ones in B
) :
sidx = B.argsort()
idx = np.searchsorted(B,A,sorter=sidx)
idx[idx==len(B)] = 0
out = A[B[sidx[idx]] != A]
in1d/isin
We can also use np.in1d
, which is pretty straight-forward (the linked docs should help clarify) as it looks for any match in B
for every element in A
and then we can use boolean-indexing with an inverted mask to look for non-matching ones -
A[~np.in1d(A,B)]
Same with isin
-
A[~np.isin(A,B)]
With invert
flag -
A[np.in1d(A,B,invert=True)]
A[np.isin(A,B,invert=True)]
This solves for a generic when B
is not necessarily sorted.
Adding to Divakar's answer above -
if the original array A has a wider range than B, that will give you an 'index out of bounds' error. See:
A = np.array([1,1,2,3,3,3,4,5,6,7,8,8,10,12,14])
B = np.array([1,2,8])
A[B[np.searchsorted(B,A)] != A]
>> IndexError: index 3 is out of bounds for axis 0 with size 3
This will happen because np.searchsorted
will assign index 3 (one-past-the-last in B) as the appropriate position for inserting in B the elements 10, 12 and 14 from A, in this example. Thus you get an IndexError in B[np.searchsorted(B,A)]
.
To circumvent that, a possible approach is:
def subset_sorted_array(A,B):
Aa = A[np.where(A <= np.max(B))]
Bb = (B[np.searchsorted(B,Aa)] != Aa)
Bb = np.pad(Bb,(0,A.shape[0]-Aa.shape[0]), method='constant', constant_values=True)
return A[Bb]
Which works as follows:
# Take only the elements in A that would be inserted in B
Aa = A[np.where(A <= np.max(B))]
# Pad the resulting filter with 'Trues' - I split this in two operations for
# easier reading
Bb = (B[np.searchsorted(B,Aa)] != Aa)
Bb = np.pad(Bb,(0,A.shape[0]-Aa.shape[0]), method='constant', constant_values=True)
# Then you can filter A by Bb
A[Bb]
# For the input arrays above:
>> array([ 3, 3, 3, 4, 5, 6, 7, 10, 12, 14])
Notice this will also work between arrays of strings and other types (for all types for which the comparison <=
operator is defined).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With