Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Stack a square DataFrame to only keep the upper/lower triangle

I have a symmetric square DataFrame in pandas:

a = np.random.rand(3, 3)
a = (a + a.T)/2
np.fill_diagonal(a, 1.)
a = pd.DataFrame(a)

That looks like this:

          0         1         2
0  1.000000  0.747064  0.357616
1  0.747064  1.000000  0.631622
2  0.357616  0.631622  1.000000

If I apply the stack method, I would get lots of redundant information (including the diagonal, in which I'm not interested):

0  0    1.000000
   1    0.747064
   2    0.357616
1  0    0.747064
   1    1.000000
   2    0.631622
2  0    0.357616
   1    0.631622
   2    1.000000

Is there a way to only get the lower (or upper) triangle this using "pure" pandas?

1  0    0.747064
2  0    0.357616
   1    0.631622
like image 273
mgalardini Avatar asked Aug 11 '17 09:08

mgalardini


Video Answer


2 Answers

You could use mask

In [278]: a.mask(np.triu(np.ones(a.shape)).astype(bool)).stack()
Out[278]:
1  0    0.747064
2  0    0.357616
   1    0.631622
dtype: float64

Or use where

In [285]: a.where(np.tril(np.ones(a.shape), -1).astype(bool)).stack()
Out[285]:
1  0    0.747064
2  0    0.357616
   1    0.631622
dtype: float64
like image 133
Zero Avatar answered Oct 20 '22 22:10

Zero


The easiest way I could think of is to force the upper (or lower) triangle to NaN, as by default the stack method will not include NaNs:

a.values[np.triu_indices_from(a, 0)] = np.nan
a.stack()

which gives:

1  0    0.747064
2  0    0.357616
   1    0.631622
like image 21
mgalardini Avatar answered Oct 21 '22 00:10

mgalardini