I have a Multindex DataFrame
with the following structure:
0 1 2 ref
A B
21 45 0.01 0.56 0.23 0.02
22 45 0.30 0.88 0.53 0.87
23 46 0.45 0.23 0.90 0.23
What I want to do with it is:
From the columns [0:2] choose the closest value to the column 'ref', so the expected result would be:
closest
A B
21 45 0.01
22 45 0.88
23 46 0.23
Reconstructing your DataFrame
:
In [1]: index = MultiIndex.from_tuples(zip([21,22,23],[45,45,46]), names=['A', 'B'])
In [2]: df = DataFrame({0:[0.01, 0.30, 0.45],
1:[0.56, 0.88, 0.23],
2:[0.23, 0.53, 0.90],
'ref': [0.02, 0.87, 0.23]}, index=index)
In [3]: df
Out[3]:
0 1 2 ref
A B
21 45 0.01 0.56 0.23 0.02
22 45 0.30 0.88 0.53 0.87
23 46 0.45 0.23 0.90 0.23
I would first get the absolute distance of columns0
, 1
and 2
from ref
:
In [4]: dist = df[[0,1,2]].sub(df['ref'], axis=0).apply(np.abs)
In [5]: dist
Out[5]:
0 1 2
A B
21 45 0.01 0.54 0.21
22 45 0.57 0.01 0.34
23 46 0.22 0.00 0.67
Given now dist
you can determine the column with the min value by row using DataFrame.idxmin
:
In [5]: idx = dist.idxmin(axis=1)
In [5]: idx
Out[5]:
A B
21 45 0
22 45 1
23 46 1
To now generate your new closest
, then you simply need to use idx
to index df
:
In [6]: df['closest'] = idx.index.map(lambda x: df.ix[x][idx.ix[x]])
In [7]: df
Out[7]:
0 1 2 ref closest
A B
21 45 0.01 0.56 0.23 0.02 0.01
22 45 0.30 0.88 0.53 0.87 0.88
23 46 0.45 0.23 0.90 0.23 0.23
For the last step, there might be a more elegant way to do it but I'm relatively new to Pandas and that's the best I can think of right now.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With