Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python version of ismember with 'rows' and index

The similar question has been asked, but none of the answers quite do what I need - some allow multidimensional searches (aka 'rows' option in matlab) but dont return the index. Some return the index but dont allow rows. My arrays are very large (1M x 2) and I have bee successful in making a loop that works, but obviously that is very slow. In matlab, the built-in ismember function takes about 10 seconds.

Here is what I am looking for:

a=np.array([[4, 6],[2, 6],[5, 2]])

b=np.array([[1, 7],[1, 8],[2, 6],[2, 1],[2, 4],[4, 6],[4, 7],[5, 9],[5, 2],[5, 1]])

The exact matlab function that does the trick is:

[~,index] = ismember(a,b,'rows')

where

index = [6, 3, 9] 
like image 208
claudiaann1 Avatar asked Nov 01 '22 02:11

claudiaann1


2 Answers

import numpy as np

def asvoid(arr):
    """
    View the array as dtype np.void (bytes)
    This views the last axis of ND-arrays as bytes so you can perform comparisons on
    the entire row.
    http://stackoverflow.com/a/16840350/190597 (Jaime, 2013-05)
    Warning: When using asvoid for comparison, note that float zeros may compare UNEQUALLY
    >>> asvoid([-0.]) == asvoid([0.])
    array([False], dtype=bool)
    """
    arr = np.ascontiguousarray(arr)
    return arr.view(np.dtype((np.void, arr.dtype.itemsize * arr.shape[-1])))


def in1d_index(a, b):
    voida, voidb = map(asvoid, (a, b))
    return np.where(np.in1d(voidb, voida))[0]    

a = np.array([[4, 6],[2, 6],[5, 2]])
b = np.array([[1, 7],[1, 8],[2, 6],[2, 1],[2, 4],[4, 6],[4, 7],[5, 9],[5, 2],[5, 1]])

print(in1d_index(a, b))

prints

[2 5 8]

This would be equivalent to Matlab's [3, 6, 9], since Python uses 0-based indexing.

Some caveats:

  1. The indices are returned in increasing order. They do not correspond to the location of the items of a in b.
  2. asvoid will work for integer dtypes, but be careful if using asvoid on float dtypes, since asvoid([-0.]) == asvoid([0.]) returns array([False]).
  3. asvoid works best on contiguous arrays. If the arrays are not contiguous, the data will be copied to a contiguous array, which will slow down the performance.

Despite the caveats, one might choose to use in1d_index anyway for the sake of speed:

def ismember_rows(a, b):
    # http://stackoverflow.com/a/22705773/190597 (ashg)
    return np.nonzero(np.all(b == a[:,np.newaxis], axis=2))[1]

In [41]: a2 = np.tile(a,(2000,1))
In [42]: b2 = np.tile(b,(2000,1))

In [46]: %timeit in1d_index(a2, b2)
100 loops, best of 3: 8.49 ms per loop

In [47]: %timeit ismember_rows(a2, b2)
1 loops, best of 3: 5.55 s per loop

So in1d_index is ~650x faster (for arrays of length in the low thousands), but again note the comparison is not exactly apples-to-apples since in1d_index returns the indices in increasing order, while ismember_rows returns the indices in the order rows of a show up in b.

like image 72
unutbu Avatar answered Nov 15 '22 04:11

unutbu


import numpy as np 
def ismember_rows(a, b):
    '''Equivalent of 'ismember' from Matlab
    a.shape = (nRows_a, nCol)
    b.shape = (nRows_b, nCol)
    return the idx where b[idx] == a
    '''
    return np.nonzero(np.all(b == a[:,np.newaxis], axis=2))[1]

a = np.array([[4, 6],[2, 6],[5, 2]])
b = np.array([[1, 7],[1, 8],[2, 6],[2, 1],[2, 4],[4, 6],[4, 7],[5, 9],[5, 2],[5, 1]])
idx = ismember_rows(a, b)
print idx
print np.all(b[idx] == a)

print

array([5, 2, 8])
True

e...I used broadcasting

--------------------------[update]------------------------------

def ismember(a, b):
    return np.flatnonzero(np.in1d(b[:,0], a[:,0]) & np.in1d(b[:,1], a[:,1]))

a = np.array([[4, 6],[2, 6],[5, 2]])
b = np.array([[1, 7],[1, 8],[2, 6],[2, 1],[2, 4],[4, 6],[4, 7],[5, 9],[5, 2],[5, 1]])
a2 = np.tile(a,(2000,1))
b2 = np.tile(b,(2000,1))

%timeit timeit in1d_index(a2, b2)
# 100 loops, best of 3: 8.74 ms per loop
%timeit ismember(a2, b2)
# 100 loops, best of 3: 8.5 ms per loop

np.all(in1d_index(a2, b2) == ismember(a2, b2))
# True

as what unutbu said, the indices are returned in increasing order

like image 43
ashg Avatar answered Nov 15 '22 05:11

ashg