Set diagonal triangle in pandas DataFrame to NaN

Question

Given the below dataframe:

import pandas as pd
import numpy as np
a = np.arange(16).reshape(4, 4)
df = pd.DataFrame(data=a, columns=['a','b','c','d'])

I'd like to produce the following result:

df([[ NaN,  1,  2,  3],
    [ NaN,  NaN,  6,  7],
    [ NaN,  NaN,  NaN, 11],
    [ NaN,  NaN,  NaN,  NaN]])

So far I've tried using np.tril_indicies, but it only works with a df turned back into a numpy array, and it only works for integer assignments (not np.nan):

il1 = np.tril_indices(4)
a[il1] = 0

gives:

array([[ 0,  1,  2,  3],
       [ 0,  0,  6,  7],
       [ 0,  0,  0, 11],
       [ 0,  0,  0,  0]])

...which is almost what I'm looking for, but barfs at assigning NaN:

ValueError: cannot convert float NaN to integer

while:

df[il1] = 0

gives:

TypeError: unhashable type: 'numpy.ndarray'

So if I want to fill the bottom triangle of a dataframe with NaN, does it 1) have to be a numpy array, or can I do this with pandas directly? And 2) Is there a way to fill bottom triangle with NaN rather than using numpy.fill_diagonal and incrementing the offset row by row down the whole DataFrame?

Another failed solution: Filling the diagonal of np array with zeros, then masking on zero and reassigning to np.nan. It converts zero values above the diagonal as NaN when they should be preserved as zero!

jezrael · Accepted Answer

You need cast to float a, because type of NaN is float:

import numpy as np
a = np.arange(16).reshape(4, 4).astype(float)
print (a)
[[  0.   1.   2.   3.]
 [  4.   5.   6.   7.]
 [  8.   9.  10.  11.]
 [ 12.  13.  14.  15.]]


il1 = np.tril_indices(4)
a[il1] = np.nan
print (a)
[[ nan   1.   2.   3.]
 [ nan  nan   6.   7.]
 [ nan  nan  nan  11.]
 [ nan  nan  nan  nan]]

df = pd.DataFrame(data=a, columns=['a','b','c','d'])
print (df)
    a    b    c     d
0 NaN  1.0  2.0   3.0
1 NaN  NaN  6.0   7.0
2 NaN  NaN  NaN  11.0
3 NaN  NaN  NaN   NaN

Divakar · Answer

An approach using np.where -

m,n = df.shape
df[:] = np.where(np.arange(m)[:,None] >= np.arange(n),np.nan,df)

Sample run -

In [93]: df
Out[93]: 
    a   b   c   d
0   0   1   2   3
1   4   5   6   7
2   8   9  10  11
3  12  13  14  15

In [94]: m,n = df.shape

In [95]: df[:] = np.where(np.arange(m)[:,None] >= np.arange(n),np.nan,df)

In [96]: df
Out[96]: 
    a    b    c     d
0 NaN  1.0  2.0   3.0
1 NaN  NaN  6.0   7.0
2 NaN  NaN  NaN  11.0
3 NaN  NaN  NaN   NaN

Set diagonal triangle in pandas DataFrame to NaN

Tags:

python

pandas

numpy

Thomas Matthew

2 Answers

jezrael

Divakar

Recent Activity

Donate For Us

Set diagonal triangle in pandas DataFrame to NaN

Tags:

python

pandas

numpy

Thomas Matthew

2 Answers

jezrael

Divakar

Related questions

Recent Activity

Donate For Us