I have the following square DataFrame:
In [104]: d
Out[104]:
a b c d e
a inf 5.909091 8.636364 7.272727 4.454545
b 7.222222 inf 8.666667 7.666667 1.777778
c 15.833333 13.000000 inf 9.166667 14.666667
d 4.444444 3.833333 3.055556 inf 4.833333
e 24.500000 8.000000 44.000000 43.500000 inf
this is modified distance matrix, representing pairwise distance between objects ['a','b','c','d','e'], where each row is divided by a coefficient (weight) and all diagonal elements artificially set to np.inf
.
How may I get a list/vector of indices like as follows in an efficient (vectorized) way:
d # index of minimal element in the column `a`
a # index of minimal element in the column `b` (excluding already found indices: [d])
b # index of minimal element in the column `c` (excluding already found indices: [d,a])
c # index of minimal element in the column `d` (excluding already found indices: [d,a,b])
I.e. in the first column we had found index d
, so when we search for a minimum in the second column we are excluding row with index d
(found previously in the first column) - this would be a
.
When we are looking for the minimum in the third column we are excluding rows with indices found previously (['d','a']
) - this would be b
.
When we are looking for the minimum in the fourth column we are excluding rows with indices found previously (['d','a','b']
) - this would be c
.
I don't need diagonal (inf
) elements, so the resulting list/vector will contain d.shape[0] - 1
elements.
I.e. the resulting list will look like: ['d','a','b','c']
or in case of Numpy solution the corresponding numerical indices: [3,0,1,2]
It's not a problem to do it using slow for loop
solution, but I can't wrap my head around a vectorized (fast) solution...
A loop is the only solution I can see here.
But you can use numpy
+ numba
to optimise.
from numba import jit
@jit(nopython=True)
def get_min_lookback(A, res):
for i in range(A.shape[1]):
res[i] = np.argmin(A[:, i])
A[res[i], :] = np.inf
return res
arr = df.values
get_min_lookback(arr, np.zeros(arr.shape[1], dtype=int))
# array([3, 0, 1, 2, 0])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With