I cannot figure out how to do "reverse melt" using Pandas in python. This is my starting data
import pandas as pd from StringIO import StringIO origin = pd.read_table(StringIO('''label type value x a 1 x b 2 x c 3 y a 4 y b 5 y c 6 z a 7 z b 8 z c 9''')) origin Out[5]: label type value 0 x a 1 1 x b 2 2 x c 3 3 y a 4 4 y b 5 5 y c 6 6 z a 7 7 z b 8 8 z c 9
This is the output I would like to have:
label a b c x 1 2 3 y 4 5 6 z 7 8 9
I'm sure there is an easy way to do this, but I don't know how.
melt() function is useful to message a DataFrame into a format where one or more columns are identifier variables, while all other columns, considered measured variables, are unpivoted to the row axis, leaving just two non-identifier columns, variable and value.
Pandas melt() function is used to change the DataFrame format from wide to long. It's used to create a specific format of the DataFrame object where one or more columns work as identifiers. All the remaining columns are treated as values and unpivoted to the row axis and only two columns - variable and value.
The first method to flatten the pandas dataframe is through NumPy python package. There is a function in NumPy that is numpy. flatten() that perform this task. First, you have to convert the dataframe to numpy using the to_numpy() method and then apply the flatten() method.
there are a few ways;
using .pivot
:
>>> origin.pivot(index='label', columns='type')['value'] type a b c label x 1 2 3 y 4 5 6 z 7 8 9 [3 rows x 3 columns]
using pivot_table
:
>>> origin.pivot_table(values='value', index='label', columns='type') value type a b c label x 1 2 3 y 4 5 6 z 7 8 9 [3 rows x 3 columns]
or .groupby
followed by .unstack
:
>>> origin.groupby(['label', 'type'])['value'].aggregate('mean').unstack() type a b c label x 1 2 3 y 4 5 6 z 7 8 9 [3 rows x 3 columns]
DataFrame.set_index
+ DataFrame.unstack
df.set_index(['label','type'])['value'].unstack() type a b c label x 1 2 3 y 4 5 6 z 7 8 9
simplifying the passing of pivot arguments
df.pivot(*df) type a b c label x 1 2 3 y 4 5 6 z 7 8 9
[*df] #['label', 'type', 'value']
For expected output we need DataFrame.reset_index
and DataFrame.rename_axis
df.pivot(*df).rename_axis(columns = None).reset_index() label a b c 0 x 1 2 3 1 y 4 5 6 2 z 7 8 9
a,b
columns we could lose information so we need GroupBy.cumcount
print(df) label type value 0 x a 1 1 x b 2 2 x c 3 3 y a 4 4 y b 5 5 y c 6 6 z a 7 7 z b 8 8 z c 9 0 x a 1 1 x b 2 2 x c 3 3 y a 4 4 y b 5 5 y c 6 6 z a 7 7 z b 8 8 z c 9
df.pivot_table(index = ['label', df.groupby(['label','type']).cumcount()], columns = 'type', values = 'value') type a b c label x 0 1 2 3 1 1 2 3 y 0 4 5 6 1 4 5 6 z 0 7 8 9 1 7 8 9
Or:
(df.assign(type_2 = df.groupby(['label','type']).cumcount()) .set_index(['label','type','type_2'])['value'] .unstack('type'))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With