I have a pandas DataFrame, df.
I want to extract a list of all the (col, index) in the df for which the value at (col, index) > .95.
Additionally, I want to condition on the fact that they are in the lower diagonal of the df, not including the diagonal itself. (If it helps, it's a correlation df, so the diagonals are 1's which is not what I am interested in.)
How can I do this?
In [71]: df = DataFrame(np.arange(25).reshape(5,5))
In [72]: df
Out[72]:
0 1 2 3 4
0 0 1 2 3 4
1 5 6 7 8 9
2 10 11 12 13 14
3 15 16 17 18 19
4 20 21 22 23 24
This masks the upper triangle (include the diagonal)
In [73]: mask = np.ones(df.shape,dtype='bool')
In [74]: mask[np.triu_indices(len(df))] = False
In [75]: mask
Out[75]:
array([[False, False, False, False, False],
[ True, False, False, False, False],
[ True, True, False, False, False],
[ True, True, True, False, False],
[ True, True, True, True, False]], dtype=bool)
Simulating your condition (> 0.95)
In [76]: df>16
Out[76]:
0 1 2 3 4
0 False False False False False
1 False False False False False
2 False False False False False
3 False False True True True
4 True True True True True
This is prob the form you want the result
In [77]: df[(df>16)&mask]
Out[77]:
0 1 2 3 4
0 NaN NaN NaN NaN NaN
1 NaN NaN NaN NaN NaN
2 NaN NaN NaN NaN NaN
3 NaN NaN 17 NaN NaN
4 20 21 22 23 NaN
If you really want the positional values
In [78]: x = ((df>16)&mask).values.nonzero()
In [79]: zip(x[0],x[1])
Out[79]: [(3, 2), (4, 0), (4, 1), (4, 2), (4, 3)]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With