Melt the Upper Triangular Matrix of a Pandas Dataframe

Tags:

Given a square pandas DataFrame of the following form:

   a  b  c
a  1 .5 .3
b .5  1 .4
c .3 .4  1

How can the upper triangle be melted to get a matrix of the following form

 Row     Column    Value
  a        a       1
  a        b       .5 
  a        c       .3
  b        b       1
  b        c       .4
  c        c       1 

#Note the combination a,b is only listed once.  There is no b,a listing

I'm more interested in an idiomatic pandas solution, a custom indexer would be easy enough to write by hand...

Thank you in advance for your consideration and response.

634

asked Dec 22 '15 14:12

Ramón J Romero y Vigil

3 Answers

First I convert lower values of df to NaN by where and numpy.triu and then stack, reset_index and set column names:

import numpy as np

print df
     a    b    c
a  1.0  0.5  0.3
b  0.5  1.0  0.4
c  0.3  0.4  1.0

print np.triu(np.ones(df.shape)).astype(np.bool)
[[ True  True  True]
 [False  True  True]
 [False False  True]]

df = df.where(np.triu(np.ones(df.shape)).astype(np.bool))
print df
    a    b    c
a   1  0.5  0.3
b NaN  1.0  0.4
c NaN  NaN  1.0

df = df.stack().reset_index()
df.columns = ['Row','Column','Value']
print df

  Row Column  Value
0   a      a    1.0
1   a      b    0.5
2   a      c    0.3
3   b      b    1.0
4   b      c    0.4
5   c      c    1.0

answered Oct 21 '22 01:10

jezrael

Building from solution by @jezrael, boolean indexing would be a more explicit approach:

import numpy
from pandas import DataFrame

df = DataFrame({'a':[1,.5,.3],'b':[.5,1,.4],'c':[.3,.4,1]},index=list('abc'))
print df,'\n'
keep = np.triu(np.ones(df.shape)).astype('bool').reshape(df.size)
print df.stack()[keep]

output:

     a    b    c
a  1.0  0.5  0.3
b  0.5  1.0  0.4
c  0.3  0.4  1.0 

a  a    1.0
   b    0.5
   c    0.3
b  b    1.0
   c    0.4
c  c    1.0
dtype: float64

answered Oct 21 '22 02:10

Matthew Davis

Also buildin on solution by @jezrael, here's a version adding a function to do the inverse operation (from xy to matrix), usefull in my case to work with covariance / correlation matrices.

def matrix_to_xy(df, columns=None, reset_index=False):
    bool_index = np.triu(np.ones(df.shape)).astype(bool)
    xy = (
        df.where(bool_index).stack().reset_index()
        if reset_index
        else df.where(bool_index).stack()
    )
    if reset_index:
        xy.columns = columns or ["row", "col", "val"]
    return xy


def xy_to_matrix(xy):
    df = xy.pivot(*xy.columns).fillna(0)
    df_vals = df.to_numpy()
    df = pd.DataFrame(
        np.triu(df_vals, 1) + df_vals.T, index=df.index, columns=df.index
    )
    return df
df = pd.DataFrame(
    {"a": [1, 0.5, 0.3], "b": [0.5, 1, 0.4], "c": [0.3, 0.4, 1]},
    index=list("abc"),
)
print(df)
xy = matrix_to_xy(df, reset_index=True)
print(xy)
mx = xy_to_matrix(xy)
print(mx)

output:

     a    b    c
a  1.0  0.5  0.3
b  0.5  1.0  0.4
c  0.3  0.4  1.0

  row col  val
0   a   a  1.0
1   a   b  0.5
2   a   c  0.3
3   b   b  1.0
4   b   c  0.4
5   c   c  1.0

row    a    b    c
row
a    1.0  0.5  0.3
b    0.5  1.0  0.4
c    0.3  0.4  1.0

answered Oct 21 '22 01:10

bravhek

Related questions
                            
                                How to combine Celery with asyncio?
                            
                                How to stop Tkinter Frame from shrinking to fit its contents?
                            
                                How to parse timezone with colon
                            
                                Break on exception in pydev
                            
                                __init__ and arguments in Python
                            
                                Django ImportError: cannot import name 'render_to_response' from 'django.shortcuts'
                            
                                tempfile.TemporaryDirectory context manager in Python 2.7
                            
                                What is the difference between Image.resize and Image.thumbnail in Pillow-Python
                            
                                python dask DataFrame, support for (trivially parallelizable) row apply?
                            
                                How do you translate this regular-expression idiom from Perl into Python?
                            
                                Custom tab completion in python argparse
                            
                                Matplotlib: Scatter Plot to Foreground on top of a Contour Plot
                            
                                Escape SQL "LIKE" value for Postgres with psycopg2
                            
                                Best way to flatten a 2D tensor containing a vector in TensorFlow?
                            
                                List Comprehension: why is this a syntax error?
                            
                                URL encoding/decoding with Python
                            
                                Find path of module without importing in Python
                            
                                TypeError : Unhashable type
                            
                                Why python mock patch doesn't work?
                            
                                Is the use of del bad?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Melt the Upper Triangular Matrix of a Pandas Dataframe

Tags:

python

pandas

numpy

reshape

melt