Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Set diagonal triangle in pandas DataFrame to NaN

Given the below dataframe:

import pandas as pd
import numpy as np
a = np.arange(16).reshape(4, 4)
df = pd.DataFrame(data=a, columns=['a','b','c','d'])

I'd like to produce the following result:

df([[ NaN,  1,  2,  3],
    [ NaN,  NaN,  6,  7],
    [ NaN,  NaN,  NaN, 11],
    [ NaN,  NaN,  NaN,  NaN]])

So far I've tried using np.tril_indicies, but it only works with a df turned back into a numpy array, and it only works for integer assignments (not np.nan):

il1 = np.tril_indices(4)
a[il1] = 0

gives:

array([[ 0,  1,  2,  3],
       [ 0,  0,  6,  7],
       [ 0,  0,  0, 11],
       [ 0,  0,  0,  0]])

...which is almost what I'm looking for, but barfs at assigning NaN:

ValueError: cannot convert float NaN to integer

while:

df[il1] = 0

gives:

TypeError: unhashable type: 'numpy.ndarray'

So if I want to fill the bottom triangle of a dataframe with NaN, does it 1) have to be a numpy array, or can I do this with pandas directly? And 2) Is there a way to fill bottom triangle with NaN rather than using numpy.fill_diagonal and incrementing the offset row by row down the whole DataFrame?

Another failed solution: Filling the diagonal of np array with zeros, then masking on zero and reassigning to np.nan. It converts zero values above the diagonal as NaN when they should be preserved as zero!

like image 255
Thomas Matthew Avatar asked Nov 19 '16 08:11

Thomas Matthew


2 Answers

You need cast to float a, because type of NaN is float:

import numpy as np
a = np.arange(16).reshape(4, 4).astype(float)
print (a)
[[  0.   1.   2.   3.]
 [  4.   5.   6.   7.]
 [  8.   9.  10.  11.]
 [ 12.  13.  14.  15.]]


il1 = np.tril_indices(4)
a[il1] = np.nan
print (a)
[[ nan   1.   2.   3.]
 [ nan  nan   6.   7.]
 [ nan  nan  nan  11.]
 [ nan  nan  nan  nan]]

df = pd.DataFrame(data=a, columns=['a','b','c','d'])
print (df)
    a    b    c     d
0 NaN  1.0  2.0   3.0
1 NaN  NaN  6.0   7.0
2 NaN  NaN  NaN  11.0
3 NaN  NaN  NaN   NaN
like image 186
jezrael Avatar answered Nov 14 '22 23:11

jezrael


An approach using np.where -

m,n = df.shape
df[:] = np.where(np.arange(m)[:,None] >= np.arange(n),np.nan,df)

Sample run -

In [93]: df
Out[93]: 
    a   b   c   d
0   0   1   2   3
1   4   5   6   7
2   8   9  10  11
3  12  13  14  15

In [94]: m,n = df.shape

In [95]: df[:] = np.where(np.arange(m)[:,None] >= np.arange(n),np.nan,df)

In [96]: df
Out[96]: 
    a    b    c     d
0 NaN  1.0  2.0   3.0
1 NaN  NaN  6.0   7.0
2 NaN  NaN  NaN  11.0
3 NaN  NaN  NaN   NaN
like image 43
Divakar Avatar answered Nov 14 '22 22:11

Divakar