I have two arrays like this:
A = [[111, ...], B = [[222, ...],
[222, ...], [111, ...],
[333, ...], [333, ...],
[555, ...]] [444, ...],
[555, ...]]
Where the first column contains identifiers and the remaining columns some data, where the number of columns of B is much larger than the number of columns of A. The identifiers are unique. The number of rows in A can be less than in B, so that in some cases empty spacer rows would be necessary.
I am looking for an efficient way to match the rows of matrix A to matrix B, so that that the result would look like that:
A = [[222, ...],
[111, ...],
[333, ...],
[nan, nan], #could be any unused value
[555, ...]]
I could just sort both matrices or write a for loop, but both approaches seem clumsy... Are there better implementations?
Here's a vectorized approach using np.searchsorted -
# Store the sorted indices of A
sidx = A[:,0].argsort()
# Find the indices of col-0 of B in col-0 of sorted A
l_idx = np.searchsorted(A[:,0],B[:,0],sorter = sidx)
# Create a mask corresponding to all those indices that indicates which indices
# corresponding to B's col-0 match up with A's col-0
valid_mask = l_idx != np.searchsorted(A[:,0],B[:,0],sorter = sidx,side='right')
# Initialize output array with NaNs.
# Use l_idx to set rows from A into output array. Use valid_mask to select
# indices from l_idx and output rows that are to be set.
out = np.full((B.shape[0],A.shape[1]),np.nan)
out[valid_mask] = A[sidx[l_idx[valid_mask]]]
Please note that valid_mask could also be created using np.in1d : np.in1d(B[:,0],A[:,0]) for a more intuitive answer. But, we are using np.searchsorted as that's better in terms of performance as also disscused in greater detail in this other solution.
Sample run -
In [184]: A
Out[184]:
array([[45, 11, 86],
[18, 74, 59],
[30, 68, 13],
[55, 47, 78]])
In [185]: B
Out[185]:
array([[45, 11, 88],
[55, 83, 46],
[95, 87, 77],
[30, 9, 37],
[14, 97, 98],
[18, 48, 53]])
In [186]: out
Out[186]:
array([[ 45., 11., 86.],
[ 55., 47., 78.],
[ nan, nan, nan],
[ 30., 68., 13.],
[ nan, nan, nan],
[ 18., 74., 59.]])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With