Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

check if numpy array is subset of another array

Tags:

python

set

numpy

Similar questions have already been asked on SO, but they have more specific constraints and their answers don't apply to my question.

Generally speaking, what is the most pythonic way to determine if an arbitrary numpy array is a subset of another array? More specifically, I have a roughly 20000x3 array and I need to know the indices of the 1x3 elements that are entirely contained within a set. More generally, is there a more pythonic way of writing the following:

master = [12, 155, 179, 234, 670, 981, 1054, 1209, 1526, 1667, 1853]  # some indices of interest
triangles = np.random.randint(2000, size=(20000, 3))  # some data

for i, x in enumerate(triangles):
    if x[0] in master and x[1] in master and x[2] in master:
        print i

For my use case, I can safely assume that len(master) << 20000. (Consequently, it is also safe to assume that master is sorted because this is cheap).

like image 526
aestrivex Avatar asked Dec 16 '22 10:12

aestrivex


2 Answers

You can do this easily via iterating over an array in list comprehension. A toy example is as follows:

import numpy as np
x = np.arange(30).reshape(10,3)
searchKey = [4,5,8]
x[[0,3,7],:] = searchKey
x

gives

 array([[ 4,  5,  8],
        [ 3,  4,  5],
        [ 6,  7,  8],
        [ 4,  5,  8],
        [12, 13, 14],
        [15, 16, 17],
        [18, 19, 20],
        [ 4,  5,  8],
        [24, 25, 26],
        [27, 28, 29]])

Now iterate over the elements:

ismember = [row==searchKey for row in x.tolist()]

The result is

[True, False, False, True, False, False, False, True, False, False]

You can modify it for being a subset as in your question:

searchKey = [2,4,10,5,8,9]  # Add more elements for testing
setSearchKey = set(searchKey)
ismember = [setSearchKey.issuperset(row) for row in x.tolist()]

If you need the indices, then use

np.where(ismember)[0]

It gives

array([0, 3, 7])
like image 140
petrichor Avatar answered Dec 18 '22 12:12

petrichor


One can also use np.isin which might be more efficient than the list comprehension in @petrichor's answer. Using the same set up:

import numpy as np

x = np.arange(30).reshape(10, 3)
searchKey = [4, 5, 8]
x[[0, 3, 7], :] = searchKey
array([[ 4,  5,  8],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 4,  5,  8],
       [12, 13, 14],
       [15, 16, 17],
       [18, 19, 20],
       [ 4,  5,  8],
       [24, 25, 26],
       [27, 28, 29]])

Now one can use np.isin; by default, it will work element wise:

np.isin(x, searchKey)
array([[ True,  True,  True],
       [False,  True,  True],
       [False, False,  True],
       [ True,  True,  True],
       [False, False, False],
       [False, False, False],
       [False, False, False],
       [ True,  True,  True],
       [False, False, False],
       [False, False, False]])

We now have to filter the rows where all entries evaluate to True for which we could use all:

np.isin(x, searchKey).all(1)
array([ True, False, False,  True, False, False, False,  True, False,
       False])

If one now wants the corresponding indices, one can use np.where:

np.where(np.isin(x, searchKey).all(1))
(array([0, 3, 7]),)

EDIT:

Just realize that one has to be careful though. For example, if I do

x[4, :] = [8, 4, 5]

so, in the assignment I use the same values as in searchKey but in a different order, I will still get it returned when doing

np.where(np.isin(x, searchKey).all(1))

which prints

(array([0, 3, 4, 7]),)

That can be undesired.

like image 40
Cleb Avatar answered Dec 18 '22 12:12

Cleb