Numpy Array Set Difference [duplicate]

Question

I have two numpy arrays that have overlapping rows:

import numpy as np

a = np.array([[1,2], [1,5], [3,4], [3,5], [4,1], [4,6]])
b = np.array([[1,5], [3,4], [4,6]])

You can assume that:

the rows are sorted
the rows within each array is unique
array b is always subset of array a

I would like to get an array that contains all rows of a that are not in b.

i.e.,:

[[1 2]
 [3 5]
 [4 1]]

Considering that a and b can be very, very large, what is the most efficient method for solving this problem?

BPL · Accepted Answer

Here's a possible solution to your problem:

import numpy as np

a = np.array([[1, 2], [3, 4], [3, 5], [4, 1], [4, 6]])
b = np.array([[3, 4], [4, 6]])

a1_rows = a.view([('', a.dtype)] * a.shape[1])
a2_rows = b.view([('', b.dtype)] * b.shape[1])
c = np.setdiff1d(a1_rows, a2_rows).view(a.dtype).reshape(-1, a.shape[1])
print c

I think using numpy.setdiff1d is the right choice here

Numpy Array Set Difference [duplicate]

Tags:

python

arrays

numpy

slaw

1 Answers

BPL

Recent Activity

Donate For Us

Numpy Array Set Difference [duplicate]

Tags:

python

arrays

numpy

slaw

1 Answers

BPL

Related questions

Recent Activity

Donate For Us