Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Melt the Upper Triangular Matrix of a Pandas Dataframe

Given a square pandas DataFrame of the following form:

   a  b  c
a  1 .5 .3
b .5  1 .4
c .3 .4  1

How can the upper triangle be melted to get a matrix of the following form

 Row     Column    Value
  a        a       1
  a        b       .5 
  a        c       .3
  b        b       1
  b        c       .4
  c        c       1 

#Note the combination a,b is only listed once.  There is no b,a listing     

I'm more interested in an idiomatic pandas solution, a custom indexer would be easy enough to write by hand...

Thank you in advance for your consideration and response.

like image 634
Ramón J Romero y Vigil Avatar asked Dec 22 '15 14:12

Ramón J Romero y Vigil


People also ask

How do you extract upper triangular matrix in python?

Python NumPy triu() is an inbuilt function that is used to return a copy of the array matrix with an element of the upper part of the triangle with respect to k.

How do you use the melt function in pandas?

Pandas melt() function is used to change the DataFrame format from wide to long. It's used to create a specific format of the DataFrame object where one or more columns work as identifiers. All the remaining columns are treated as values and unpivoted to the row axis and only two columns - variable and value.

How do you reshape a Pandas DataFrame?

You can use the following basic syntax to convert a pandas DataFrame from a wide format to a long format: df = pd. melt(df, id_vars='col1', value_vars=['col2', 'col3', ...]) In this scenario, col1 is the column we use as an identifier and col2, col3, etc.

How do you melt a Dataframe in pandas?

Pandas.melt () melt () is used to convert a wide dataframe into a longer form. This function can be used when there are requirements to consider a specific column as an identifier. Syntax: pandas.melt (frame, id_vars=None, value_vars=None, var_name=None, value_name=’value’, col_level=None)

What does pandas melt () do?

Pandas.melt() unpivots a DataFrame from wide format to long format. melt() function is useful to massage a DataFrame into a format where one or more columns are identifier variables, while all other columns, considered measured variables, are unpivoted to the row axis, leaving just two non-identifier columns, variable and value.

How to UNPIVOT a Dataframe from wide format in pandas?

Pandas dataframe.melt () function unpivots a DataFrame from wide format to long format, optionally leaving identifier variables set.

What is reshaping in pandas Dataframe?

Reshaping plays a crucial role in data analysis. Pandas provide function like melt and unmelt for reshaping. melt () is used to convert a wide dataframe into a longer form. This function can be used when there are requirements to consider a specific column as an identifier.


3 Answers

First I convert lower values of df to NaN by where and numpy.triu and then stack, reset_index and set column names:

import numpy as np

print df
     a    b    c
a  1.0  0.5  0.3
b  0.5  1.0  0.4
c  0.3  0.4  1.0

print np.triu(np.ones(df.shape)).astype(np.bool)
[[ True  True  True]
 [False  True  True]
 [False False  True]]

df = df.where(np.triu(np.ones(df.shape)).astype(np.bool))
print df
    a    b    c
a   1  0.5  0.3
b NaN  1.0  0.4
c NaN  NaN  1.0

df = df.stack().reset_index()
df.columns = ['Row','Column','Value']
print df

  Row Column  Value
0   a      a    1.0
1   a      b    0.5
2   a      c    0.3
3   b      b    1.0
4   b      c    0.4
5   c      c    1.0
like image 63
jezrael Avatar answered Oct 21 '22 01:10

jezrael


Building from solution by @jezrael, boolean indexing would be a more explicit approach:

import numpy
from pandas import DataFrame

df = DataFrame({'a':[1,.5,.3],'b':[.5,1,.4],'c':[.3,.4,1]},index=list('abc'))
print df,'\n'
keep = np.triu(np.ones(df.shape)).astype('bool').reshape(df.size)
print df.stack()[keep]

output:

     a    b    c
a  1.0  0.5  0.3
b  0.5  1.0  0.4
c  0.3  0.4  1.0 

a  a    1.0
   b    0.5
   c    0.3
b  b    1.0
   c    0.4
c  c    1.0
dtype: float64
like image 24
Matthew Davis Avatar answered Oct 21 '22 02:10

Matthew Davis


Also buildin on solution by @jezrael, here's a version adding a function to do the inverse operation (from xy to matrix), usefull in my case to work with covariance / correlation matrices.

def matrix_to_xy(df, columns=None, reset_index=False):
    bool_index = np.triu(np.ones(df.shape)).astype(bool)
    xy = (
        df.where(bool_index).stack().reset_index()
        if reset_index
        else df.where(bool_index).stack()
    )
    if reset_index:
        xy.columns = columns or ["row", "col", "val"]
    return xy


def xy_to_matrix(xy):
    df = xy.pivot(*xy.columns).fillna(0)
    df_vals = df.to_numpy()
    df = pd.DataFrame(
        np.triu(df_vals, 1) + df_vals.T, index=df.index, columns=df.index
    )
    return df
df = pd.DataFrame(
    {"a": [1, 0.5, 0.3], "b": [0.5, 1, 0.4], "c": [0.3, 0.4, 1]},
    index=list("abc"),
)
print(df)
xy = matrix_to_xy(df, reset_index=True)
print(xy)
mx = xy_to_matrix(xy)
print(mx)

output:

     a    b    c
a  1.0  0.5  0.3
b  0.5  1.0  0.4
c  0.3  0.4  1.0

  row col  val
0   a   a  1.0
1   a   b  0.5
2   a   c  0.3
3   b   b  1.0
4   b   c  0.4
5   c   c  1.0

row    a    b    c
row
a    1.0  0.5  0.3
b    0.5  1.0  0.4
c    0.3  0.4  1.0
like image 35
bravhek Avatar answered Oct 21 '22 01:10

bravhek