I have a pandas dataframe in the rough format
print(df)
Time GroupA GroupB Value1 Value2
0 100.0 1.0 1.0 18.0 0.0
1 100.0 1.0 2.0 16.0 0.0
2 100.0 2.0 1.0 18.0 0.0
3 100.0 2.0 2.0 10.0 0.0
where Time
is a count variable / timestamp, GroupA
and GroupB
are categories, and Value1
and Value2
are numerical quantities. This code snippet creates a mockup dataframe:
import numpy as np
values = np.zeros(shape=(4,5))
values[:,0] = 100
values[:,1] = [1]*2 + [2]*2
values[:,2] = [1,2]*2
values[:,3] = np.random.randint(low=10,high=20,size=(4))
df = pd.DataFrame(values,columns=['Time','GroupA','GroupB','Value1','Value2'])
After loading in some data, I want to calculate and fill in values of Value2
. As it happens (since, incidentally, Value2
is a time series function of Value1
within each existing (GroupA
, GroupB
) pair), I found it easiest to calculate these values by first pivoting my data into the form:
df_pivot = df.pivot_table(index='Time',columns=['GroupA','GroupB'],values=['Value1','Value2'], fill_value=0.0)
Then after some unrelated code I have filled in values
print(df_pivot)
Value1 Value2
GroupA 1.0 2.0 1.0 2.0
GroupB 1.0 2.0 1.0 2.0 1.0 2.0 1.0 2.0
Time
100.0 13 16 16 10 27 20 28 20
Now I want to "unpivot" this back to the original format of df
. I could do this manually by looping over df
, looking up the value in df_pivot
, and filling it, but I'd prefer to use built-in functions. Try as I might using variations of df.melt
, I cannot perform this inversion, because of problems with df_pivot
's hierarchical columns. My best attempt is
dfm = df_pivot.reset_index().melt(id_vars="Time")
dfm.columns.values[1] = "HACK"
dfm = dfm.pivot_table(index=["Time","GroupA","GroupB"],columns="HACK",values="value").reset_index()
which produces the data frame
print(dfm)
HACK Time GroupA GroupB Value1 Value2
0 100.0 1.0 1.0 13 27
1 100.0 1.0 2.0 16 20
2 100.0 2.0 1.0 16 28
3 100.0 2.0 2.0 10 20
This works, but doesn't strike me as be best solution, or very portable (why does melt produce a "NaN" column name? why do I have a manually find the index of this column and rename it? why do I have to pivot to undo a pivot?) Experimenting and looking through documentation and examples for an alternative, I'm at a loss, though. The melt
function has a col_level
argument that looks like it should help, but any valid value I use for this just leads to data loss (losing the "Time", "GroupA", or "GroupB" data).
SQL PIVOT transposes a table-valued expression from a unique set of values from one column into multiple columns in the output and performs aggregations. SQL UNPIVOT performs the opposite operation of SQL PIVOT by transforming the columns of a table-valued expression into column values.
melt() function to reverse our pivoted data. The . melt() function comes in handy when you need to reshape or unpivot a DataFrame.
Pandas has a pivot_table function that applies a pivot on a DataFrame. It also supports aggfunc that defines the statistic to calculate when pivoting (aggfunc is np. mean by default, which calculates the average).
We can also do the reverse of the melt operation which is also called as Pivoting. In Pivoting or Reverse Melting, we convert a column with multiple values into several columns of their own. The pivot() method on the dataframe takes two main arguments index and columns .
I think stack
is more straightforward
df_pivot.stack([1,2]).reset_index()
Out[8]:
Time GroupA GroupB Value1 Value2
0 100.0 1.0 1.0 13 0
1 100.0 1.0 2.0 13 0
2 100.0 2.0 1.0 12 0
3 100.0 2.0 2.0 11 0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With