I have a symmetric square DataFrame
in pandas
:
a = np.random.rand(3, 3)
a = (a + a.T)/2
np.fill_diagonal(a, 1.)
a = pd.DataFrame(a)
That looks like this:
0 1 2
0 1.000000 0.747064 0.357616
1 0.747064 1.000000 0.631622
2 0.357616 0.631622 1.000000
If I apply the stack
method, I would get lots of redundant information (including the diagonal, in which I'm not interested):
0 0 1.000000
1 0.747064
2 0.357616
1 0 0.747064
1 1.000000
2 0.631622
2 0 0.357616
1 0.631622
2 1.000000
Is there a way to only get the lower (or upper) triangle this using "pure" pandas
?
1 0 0.747064
2 0 0.357616
1 0.631622
You could use mask
In [278]: a.mask(np.triu(np.ones(a.shape)).astype(bool)).stack()
Out[278]:
1 0 0.747064
2 0 0.357616
1 0.631622
dtype: float64
Or use where
In [285]: a.where(np.tril(np.ones(a.shape), -1).astype(bool)).stack()
Out[285]:
1 0 0.747064
2 0 0.357616
1 0.631622
dtype: float64
The easiest way I could think of is to force the upper (or lower) triangle to NaN, as by default the stack
method will not include NaNs:
a.values[np.triu_indices_from(a, 0)] = np.nan
a.stack()
which gives:
1 0 0.747064
2 0 0.357616
1 0.631622
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With