Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas: get elements (index, col) below diagonal in DataFrame

I have a pandas DataFrame, df.

I want to extract a list of all the (col, index) in the df for which the value at (col, index) > .95.

Additionally, I want to condition on the fact that they are in the lower diagonal of the df, not including the diagonal itself. (If it helps, it's a correlation df, so the diagonals are 1's which is not what I am interested in.)

How can I do this?

like image 340
wolfsatthedoor Avatar asked Oct 21 '14 02:10

wolfsatthedoor


1 Answers

In [71]: df = DataFrame(np.arange(25).reshape(5,5))

In [72]: df
Out[72]: 
    0   1   2   3   4
0   0   1   2   3   4
1   5   6   7   8   9
2  10  11  12  13  14
3  15  16  17  18  19
4  20  21  22  23  24

This masks the upper triangle (include the diagonal)

In [73]: mask = np.ones(df.shape,dtype='bool')

In [74]: mask[np.triu_indices(len(df))] = False

In [75]: mask
Out[75]: 
array([[False, False, False, False, False],
       [ True, False, False, False, False],
       [ True,  True, False, False, False],
       [ True,  True,  True, False, False],
       [ True,  True,  True,  True, False]], dtype=bool)

Simulating your condition (> 0.95)

In [76]: df>16
Out[76]: 
       0      1      2      3      4
0  False  False  False  False  False
1  False  False  False  False  False
2  False  False  False  False  False
3  False  False   True   True   True
4   True   True   True   True   True

This is prob the form you want the result

In [77]: df[(df>16)&mask] 
Out[77]: 
    0   1   2   3   4
0 NaN NaN NaN NaN NaN
1 NaN NaN NaN NaN NaN
2 NaN NaN NaN NaN NaN
3 NaN NaN  17 NaN NaN
4  20  21  22  23 NaN

If you really want the positional values

In [78]: x = ((df>16)&mask).values.nonzero()

In [79]: zip(x[0],x[1])
Out[79]: [(3, 2), (4, 0), (4, 1), (4, 2), (4, 3)]
like image 83
Jeff Avatar answered Sep 28 '22 02:09

Jeff