Given a square pandas DataFrame of the following form:
a b c
a 1 .5 .3
b .5 1 .4
c .3 .4 1
How can the upper triangle be melted to get a matrix of the following form
Row Column Value
a a 1
a b .5
a c .3
b b 1
b c .4
c c 1
#Note the combination a,b is only listed once. There is no b,a listing
I'm more interested in an idiomatic pandas solution, a custom indexer would be easy enough to write by hand...
Thank you in advance for your consideration and response.
Python NumPy triu() is an inbuilt function that is used to return a copy of the array matrix with an element of the upper part of the triangle with respect to k.
Pandas melt() function is used to change the DataFrame format from wide to long. It's used to create a specific format of the DataFrame object where one or more columns work as identifiers. All the remaining columns are treated as values and unpivoted to the row axis and only two columns - variable and value.
You can use the following basic syntax to convert a pandas DataFrame from a wide format to a long format: df = pd. melt(df, id_vars='col1', value_vars=['col2', 'col3', ...]) In this scenario, col1 is the column we use as an identifier and col2, col3, etc.
Pandas.melt () melt () is used to convert a wide dataframe into a longer form. This function can be used when there are requirements to consider a specific column as an identifier. Syntax: pandas.melt (frame, id_vars=None, value_vars=None, var_name=None, value_name=’value’, col_level=None)
Pandas.melt() unpivots a DataFrame from wide format to long format. melt() function is useful to massage a DataFrame into a format where one or more columns are identifier variables, while all other columns, considered measured variables, are unpivoted to the row axis, leaving just two non-identifier columns, variable and value.
Pandas dataframe.melt () function unpivots a DataFrame from wide format to long format, optionally leaving identifier variables set.
Reshaping plays a crucial role in data analysis. Pandas provide function like melt and unmelt for reshaping. melt () is used to convert a wide dataframe into a longer form. This function can be used when there are requirements to consider a specific column as an identifier.
First I convert lower values of df
to NaN
by where
and numpy.triu
and then stack
, reset_index
and set column names:
import numpy as np
print df
a b c
a 1.0 0.5 0.3
b 0.5 1.0 0.4
c 0.3 0.4 1.0
print np.triu(np.ones(df.shape)).astype(np.bool)
[[ True True True]
[False True True]
[False False True]]
df = df.where(np.triu(np.ones(df.shape)).astype(np.bool))
print df
a b c
a 1 0.5 0.3
b NaN 1.0 0.4
c NaN NaN 1.0
df = df.stack().reset_index()
df.columns = ['Row','Column','Value']
print df
Row Column Value
0 a a 1.0
1 a b 0.5
2 a c 0.3
3 b b 1.0
4 b c 0.4
5 c c 1.0
Building from solution by @jezrael, boolean indexing would be a more explicit approach:
import numpy
from pandas import DataFrame
df = DataFrame({'a':[1,.5,.3],'b':[.5,1,.4],'c':[.3,.4,1]},index=list('abc'))
print df,'\n'
keep = np.triu(np.ones(df.shape)).astype('bool').reshape(df.size)
print df.stack()[keep]
output:
a b c
a 1.0 0.5 0.3
b 0.5 1.0 0.4
c 0.3 0.4 1.0
a a 1.0
b 0.5
c 0.3
b b 1.0
c 0.4
c c 1.0
dtype: float64
Also buildin on solution by @jezrael, here's a version adding a function to do the inverse operation (from xy to matrix), usefull in my case to work with covariance / correlation matrices.
def matrix_to_xy(df, columns=None, reset_index=False):
bool_index = np.triu(np.ones(df.shape)).astype(bool)
xy = (
df.where(bool_index).stack().reset_index()
if reset_index
else df.where(bool_index).stack()
)
if reset_index:
xy.columns = columns or ["row", "col", "val"]
return xy
def xy_to_matrix(xy):
df = xy.pivot(*xy.columns).fillna(0)
df_vals = df.to_numpy()
df = pd.DataFrame(
np.triu(df_vals, 1) + df_vals.T, index=df.index, columns=df.index
)
return df
df = pd.DataFrame(
{"a": [1, 0.5, 0.3], "b": [0.5, 1, 0.4], "c": [0.3, 0.4, 1]},
index=list("abc"),
)
print(df)
xy = matrix_to_xy(df, reset_index=True)
print(xy)
mx = xy_to_matrix(xy)
print(mx)
output:
a b c
a 1.0 0.5 0.3
b 0.5 1.0 0.4
c 0.3 0.4 1.0
row col val
0 a a 1.0
1 a b 0.5
2 a c 0.3
3 b b 1.0
4 b c 0.4
5 c c 1.0
row a b c
row
a 1.0 0.5 0.3
b 0.5 1.0 0.4
c 0.3 0.4 1.0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With