Say I have these 2D arrays <code>A</code> and <code>B</code>. How can I remove elements from <code>A</code> that are in <code>B</code>. (Complement in set theory: A-B) <pre class="prettyprint"><code>A=np.asarray([[1,1,1], [1,1,2], [1,1,3], [1,1,4]]) B=np.asarray([[0,0,0], [1,0,2], [1,0,3], [1,0,4], [1,1,0], [1,1,1], [1,1,4]]) #output = [[1,1,2], [1,1,3]] </code></pre> <hr> To be more precise, I would like to do something like this. <pre class="prettyprint"><code>data = some numpy array label = some numpy array A = np.argwhere(label==0) #[[1 1 1], [1 1 2], [1 1 3], [1 1 4]] B = np.argwhere(data>1.5) #[[0 0 0], [1 0 2], [1 0 3], [1 0 4], [1 1 0], [1 1 1], [1 1 4]] out = np.argwhere(label==0 and data>1.5) #[[1 1 2], [1 1 3]] </code></pre>

there is an easy solution with a list comprehension, <pre class="prettyprint"><code>A = [i for i in A if i not in B] </code></pre> Result <pre class="prettyprint"><code>[[1, 1, 2], [1, 1, 3]] </code></pre> List comprehension is not removing the elements from the array, it's just reassigning - if you want to remove the elements, use this method: <pre class="prettyprint"><code>for i in B: if i in A: A.remove(i) </code></pre>

Here is a Numpythonic approach with broadcasting: <pre class="prettyprint"><code>In [83]: A[np.all(np.any((A-B[:, None]), axis=2), axis=0)] Out[83]: array([[1, 1, 2], [1, 1, 3]]) </code></pre> Here is a timeit with other answer: <pre class="prettyprint"><code>In [90]: def cal_diff(A, B): ....: A_rows = A.view([('', A.dtype)] * A.shape[1]) ....: B_rows = B.view([('', B.dtype)] * B.shape[1]) ....: return np.setdiff1d(A_rows, B_rows).view(A.dtype).reshape(-1, A.shape[1]) ....: In [93]: %timeit cal_diff(A, B) 10000 loops, best of 3: 54.1 µs per loop In [94]: %timeit A[np.all(np.any((A-B[:, None]), axis=2), axis=0)] 100000 loops, best of 3: 9.41 µs per loop # Even better with Divakar's suggestion In [97]: %timeit A[~((A[:,None,:] == B).all(-1)).any(1)] 100000 loops, best of 3: 7.41 µs per loop </code></pre> Well, if you are looking for a faster way you should looking for ways that reduce the number of comparisons. In this case (without considering the order) you can generate a unique number from your rows and compare the numbers which can be done with summing the items power of two. Here is the benchmark with Divakar's in1d approach: <pre class="prettyprint"><code>In [144]: def in1d_approach(A,B): .....: dims = np.maximum(B.max(0),A.max(0))+1 .....: return A[~np.in1d(np.ravel_multi_index(A.T,dims),\ .....: np.ravel_multi_index(B.T,dims))] .....: In [146]: %timeit in1d_approach(A, B) 10000 loops, best of 3: 23.8 µs per loop In [145]: %timeit A[~np.in1d(np.power(A, 2).sum(1), np.power(B, 2).sum(1))] 10000 loops, best of 3: 20.2 µs per loop </code></pre> You can use <code>np.diff</code> to get the an order independent result: <pre class="prettyprint"><code>In [194]: B=np.array([[0, 0, 0,], [1, 0, 2,], [1, 0, 3,], [1, 0, 4,], [1, 1, 0,], [1, 1, 1,], [1, 1, 4,], [4, 1, 1]]) In [195]: A[~np.in1d(np.diff(np.diff(np.power(A, 2))), np.diff(np.diff(np.power(B, 2))))] Out[195]: array([[1, 1, 2], [1, 1, 3]]) In [196]: %timeit A[~np.in1d(np.diff(np.diff(np.power(A, 2))), np.diff(np.diff(np.power(B, 2))))] 10000 loops, best of 3: 30.7 µs per loop </code></pre> Benchmark with Divakar's setup: <pre class="prettyprint"><code>In [198]: B = np.random.randint(0,9,(1000,3)) In [199]: A = np.random.randint(0,9,(100,3)) In [200]: A_idx = np.random.choice(np.arange(A.shape[0]),size=10,replace=0) In [201]: B_idx = np.random.choice(np.arange(B.shape[0]),size=10,replace=0) In [202]: A[A_idx] = B[B_idx] In [203]: %timeit A[~np.in1d(np.diff(np.diff(np.power(A, 2))), np.diff(np.diff(np.power(B, 2))))] 10000 loops, best of 3: 137 µs per loop In [204]: %timeit A[~np.in1d(np.power(A, 2).sum(1), np.power(B, 2).sum(1))] 10000 loops, best of 3: 112 µs per loop In [205]: %timeit in1d_approach(A, B) 10000 loops, best of 3: 115 µs per loop </code></pre> Timing with larger arrays (Divakar's solution is slightly faster): <pre class="prettyprint"><code>In [231]: %timeit A[~np.in1d(np.diff(np.diff(np.power(A, 2))), np.diff(np.diff(np.power(B, 2))))] 1000 loops, best of 3: 1.01 ms per loop In [232]: %timeit A[~np.in1d(np.power(A, 2).sum(1), np.power(B, 2).sum(1))] 1000 loops, best of 3: 880 µs per loop In [233]: %timeit in1d_approach(A, B) 1000 loops, best of 3: 807 µs per loop </code></pre>

Removing elements from an array that are in another array

Tags:

python

arrays

numpy

Say I have these 2D arrays A and B.

How can I remove elements from A that are in B. (Complement in set theory: A-B)

A=np.asarray([[1,1,1], [1,1,2], [1,1,3], [1,1,4]])
B=np.asarray([[0,0,0], [1,0,2], [1,0,3], [1,0,4], [1,1,0], [1,1,1], [1,1,4]])
#output = [[1,1,2], [1,1,3]]

To be more precise, I would like to do something like this.

data = some numpy array
label = some numpy array
A = np.argwhere(label==0) #[[1 1 1], [1 1 2], [1 1 3], [1 1 4]]
B = np.argwhere(data>1.5) #[[0 0 0], [1 0 2], [1 0 3], [1 0 4], [1 1 0], [1 1 1], [1 1 4]]
out = np.argwhere(label==0 and data>1.5) #[[1 1 2], [1 1 3]]

294

asked Oct 15 '16 06:10

Jee Seok Yoon

4 Answers

there is an easy solution with a list comprehension,

A = [i for i in A if i not in B]

Result

[[1, 1, 2], [1, 1, 3]]

List comprehension is not removing the elements from the array, it's just reassigning - if you want to remove the elements, use this method:

for i in B:
     if i in A:
     A.remove(i)

164

answered Oct 17 '22 12:10

Rahul K P

Based on this solution to Find the row indexes of several values in a numpy array, here's a NumPy based solution with less memory footprint and could be beneficial when working with large arrays -

dims = np.maximum(B.max(0),A.max(0))+1
out = A[~np.in1d(np.ravel_multi_index(A.T,dims),np.ravel_multi_index(B.T,dims))]

Sample run -

In [38]: A
Out[38]: 
array([[1, 1, 1],
       [1, 1, 2],
       [1, 1, 3],
       [1, 1, 4]])

In [39]: B
Out[39]: 
array([[0, 0, 0],
       [1, 0, 2],
       [1, 0, 3],
       [1, 0, 4],
       [1, 1, 0],
       [1, 1, 1],
       [1, 1, 4]])

In [40]: out
Out[40]: 
array([[1, 1, 2],
       [1, 1, 3]])

Runtime test on large arrays -

In [107]: def in1d_approach(A,B):
     ...:     dims = np.maximum(B.max(0),A.max(0))+1
     ...:     return A[~np.in1d(np.ravel_multi_index(A.T,dims),\
     ...:                     np.ravel_multi_index(B.T,dims))]
     ...: 

In [108]: # Setup arrays with B as large array and A contains some of B's rows
     ...: B = np.random.randint(0,9,(1000,3))
     ...: A = np.random.randint(0,9,(100,3))
     ...: A_idx = np.random.choice(np.arange(A.shape[0]),size=10,replace=0)
     ...: B_idx = np.random.choice(np.arange(B.shape[0]),size=10,replace=0)
     ...: A[A_idx] = B[B_idx]
     ...:

Timings with broadcasting based solutions -

In [109]: %timeit A[np.all(np.any((A-B[:, None]), axis=2), axis=0)]
100 loops, best of 3: 4.64 ms per loop # @Kasramvd's soln

In [110]: %timeit A[~((A[:,None,:] == B).all(-1)).any(1)]
100 loops, best of 3: 3.66 ms per loop

Timing with less memory footprint based solution -

In [111]: %timeit in1d_approach(A,B)
1000 loops, best of 3: 231 µs per loop

Further performance boost

in1d_approach reduces each row by considering each row as an indexing tuple. We can do the same a bit more efficiently by introducing matrix-multiplication with np.dot, like so -

def in1d_dot_approach(A,B):
    cumdims = (np.maximum(A.max(),B.max())+1)**np.arange(B.shape[1])
    return A[~np.in1d(A.dot(cumdims),B.dot(cumdims))]

Let's test it against the previous on much larger arrays -

In [251]: # Setup arrays with B as large array and A contains some of B's rows
     ...: B = np.random.randint(0,9,(10000,3))
     ...: A = np.random.randint(0,9,(1000,3))
     ...: A_idx = np.random.choice(np.arange(A.shape[0]),size=10,replace=0)
     ...: B_idx = np.random.choice(np.arange(B.shape[0]),size=10,replace=0)
     ...: A[A_idx] = B[B_idx]
     ...: 

In [252]: %timeit in1d_approach(A,B)
1000 loops, best of 3: 1.28 ms per loop

In [253]: %timeit in1d_dot_approach(A, B)
1000 loops, best of 3: 1.2 ms per loop

answered Oct 17 '22 12:10

Divakar

Here is a Numpythonic approach with broadcasting:

In [83]: A[np.all(np.any((A-B[:, None]), axis=2), axis=0)]
Out[83]: 
array([[1, 1, 2],
       [1, 1, 3]])

Here is a timeit with other answer:

In [90]: def cal_diff(A, B):
   ....:     A_rows = A.view([('', A.dtype)] * A.shape[1])
   ....:     B_rows = B.view([('', B.dtype)] * B.shape[1])
   ....:     return np.setdiff1d(A_rows, B_rows).view(A.dtype).reshape(-1, A.shape[1])
   ....: 

In [93]: %timeit cal_diff(A, B)
10000 loops, best of 3: 54.1 µs per loop

In [94]: %timeit A[np.all(np.any((A-B[:, None]), axis=2), axis=0)]
100000 loops, best of 3: 9.41 µs per loop

# Even better with Divakar's suggestion
In [97]: %timeit A[~((A[:,None,:] == B).all(-1)).any(1)]
100000 loops, best of 3: 7.41 µs per loop

Well, if you are looking for a faster way you should looking for ways that reduce the number of comparisons. In this case (without considering the order) you can generate a unique number from your rows and compare the numbers which can be done with summing the items power of two.

Here is the benchmark with Divakar's in1d approach:

In [144]: def in1d_approach(A,B):
   .....:         dims = np.maximum(B.max(0),A.max(0))+1
   .....:         return A[~np.in1d(np.ravel_multi_index(A.T,dims),\
   .....:                          np.ravel_multi_index(B.T,dims))]
   .....: 

In [146]: %timeit in1d_approach(A, B)
10000 loops, best of 3: 23.8 µs per loop

In [145]: %timeit A[~np.in1d(np.power(A, 2).sum(1), np.power(B, 2).sum(1))]
10000 loops, best of 3: 20.2 µs per loop

You can use np.diff to get the an order independent result:

In [194]: B=np.array([[0, 0, 0,], [1, 0, 2,], [1, 0, 3,], [1, 0, 4,], [1, 1, 0,], [1, 1, 1,], [1, 1, 4,], [4, 1, 1]])

In [195]: A[~np.in1d(np.diff(np.diff(np.power(A, 2))), np.diff(np.diff(np.power(B, 2))))]
Out[195]: 
array([[1, 1, 2],
       [1, 1, 3]])

In [196]: %timeit A[~np.in1d(np.diff(np.diff(np.power(A, 2))), np.diff(np.diff(np.power(B, 2))))]
10000 loops, best of 3: 30.7 µs per loop

Benchmark with Divakar's setup:

In [198]: B = np.random.randint(0,9,(1000,3))

In [199]: A = np.random.randint(0,9,(100,3))

In [200]: A_idx = np.random.choice(np.arange(A.shape[0]),size=10,replace=0)

In [201]: B_idx = np.random.choice(np.arange(B.shape[0]),size=10,replace=0)

In [202]: A[A_idx] = B[B_idx]

In [203]: %timeit A[~np.in1d(np.diff(np.diff(np.power(A, 2))), np.diff(np.diff(np.power(B, 2))))]
10000 loops, best of 3: 137 µs per loop

In [204]: %timeit A[~np.in1d(np.power(A, 2).sum(1), np.power(B, 2).sum(1))]
10000 loops, best of 3: 112 µs per loop

In [205]: %timeit in1d_approach(A, B)
10000 loops, best of 3: 115 µs per loop

Timing with larger arrays (Divakar's solution is slightly faster):

In [231]: %timeit A[~np.in1d(np.diff(np.diff(np.power(A, 2))), np.diff(np.diff(np.power(B, 2))))]
1000 loops, best of 3: 1.01 ms per loop

In [232]: %timeit A[~np.in1d(np.power(A, 2).sum(1), np.power(B, 2).sum(1))]
1000 loops, best of 3: 880 µs per loop

In [233]:  %timeit in1d_approach(A, B)
1000 loops, best of 3: 807 µs per loop

answered Oct 17 '22 12:10

Mazdak

If you want to do it the numpy way,

import numpy as np

A = np.array([[1, 1, 1,], [1, 1, 2], [1, 1, 3], [1, 1, 4]])
B = np.array([[0, 0, 0], [1, 0, 2], [1, 0, 3], [1, 0, 4], [1, 1, 0], [1, 1, 1], [1, 1, 4]])
A_rows = A.view([('', A.dtype)] * A.shape[1])
B_rows = B.view([('', B.dtype)] * B.shape[1])

diff_array = np.setdiff1d(A_rows, B_rows).view(A.dtype).reshape(-1, A.shape[1])

As @Rahul suggested, for a non numpy easy solution,

diff_array = [i for i in A if i not in B]

answered Oct 17 '22 12:10

R. S. Nikhil Krishna

Related questions
                            
                                Using Django settings in templates [duplicate]
                            
                                Detect socket hangup without sending or receiving?
                            
                                Any yaml libraries in Python that support dumping of long strings as block literals or folded blocks?
                            
                                Is it possible to show the exact position in Sublime Text 2?
                            
                                Pandas: Sorting columns by their mean value
                            
                                list comprehension replace for loop in 2D matrix
                            
                                'function' object has no attribute 'name' when registering blueprint
                            
                                `ValueError: A value in x_new is above the interpolation range.` - what other reasons than not ascending values?
                            
                                Auto __repr__ method
                            
                                Django: Overriding __init__ for Custom Forms
                            
                                Display graph without saving using pydot
                            
                                What is the simplest way to swap each pair of adjoining chars in a string with Python?
                            
                                Python vs Matlab [closed]
                            
                                How to get the difference of two querysets in Django?
                            
                                csv.write skipping lines when writing to csv
                            
                                How to get the highest element in absolute value in a numpy matrix?
                            
                                Python to automatically select serial ports (for Arduino)
                            
                                Capturing high multi-collinearity in statsmodels
                            
                                Unmelt Pandas DataFrame
                            
                                Django Rest Framework model Id field in nested relationship serializer

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Removing elements from an array that are in another array

Tags:

python

arrays

numpy

Jee Seok Yoon

People also ask

4 Answers

Rahul K P

Divakar

Mazdak

R. S. Nikhil Krishna

Recent Activity

Donate For Us