Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

sort rows of array to match order of another array using an identifier column

I have two arrays like this:

A = [[111, ...],          B = [[222, ...],
     [222, ...],               [111, ...],
     [333, ...],               [333, ...],
     [555, ...]]               [444, ...],
                               [555, ...]]

Where the first column contains identifiers and the remaining columns some data, where the number of columns of B is much larger than the number of columns of A. The identifiers are unique. The number of rows in A can be less than in B, so that in some cases empty spacer rows would be necessary.
I am looking for an efficient way to match the rows of matrix A to matrix B, so that that the result would look like that:

A = [[222, ...],
     [111, ...],
     [333, ...],
     [nan, nan], #could be any unused value
     [555, ...]]

I could just sort both matrices or write a for loop, but both approaches seem clumsy... Are there better implementations?

like image 588
Dahlai Avatar asked Oct 28 '25 10:10

Dahlai


1 Answers

Here's a vectorized approach using np.searchsorted -

# Store the sorted indices of A
sidx = A[:,0].argsort()

# Find the indices of col-0 of B in col-0 of sorted A
l_idx = np.searchsorted(A[:,0],B[:,0],sorter = sidx)

# Create a mask corresponding to all those indices that indicates which indices
# corresponding to B's col-0 match up with A's col-0
valid_mask = l_idx != np.searchsorted(A[:,0],B[:,0],sorter = sidx,side='right')

# Initialize output array with NaNs. 
# Use l_idx to set rows from A into output array. Use valid_mask to select 
# indices from l_idx and output rows that are to be set.
out = np.full((B.shape[0],A.shape[1]),np.nan)
out[valid_mask] = A[sidx[l_idx[valid_mask]]]

Please note that valid_mask could also be created using np.in1d : np.in1d(B[:,0],A[:,0]) for a more intuitive answer. But, we are using np.searchsorted as that's better in terms of performance as also disscused in greater detail in this other solution.

Sample run -

In [184]: A
Out[184]: 
array([[45, 11, 86],
       [18, 74, 59],
       [30, 68, 13],
       [55, 47, 78]])

In [185]: B
Out[185]: 
array([[45, 11, 88],
       [55, 83, 46],
       [95, 87, 77],
       [30,  9, 37],
       [14, 97, 98],
       [18, 48, 53]])

In [186]: out
Out[186]: 
array([[ 45.,  11.,  86.],
       [ 55.,  47.,  78.],
       [ nan,  nan,  nan],
       [ 30.,  68.,  13.],
       [ nan,  nan,  nan],
       [ 18.,  74.,  59.]])
like image 138
Divakar Avatar answered Oct 30 '25 01:10

Divakar



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!