Exact inverse of pandas' "pivot" operation

Tags:

I have a pandas dataframe in the rough format

print(df)
    Time  GroupA  GroupB  Value1  Value2
0  100.0     1.0     1.0    18.0     0.0
1  100.0     1.0     2.0    16.0     0.0
2  100.0     2.0     1.0    18.0     0.0
3  100.0     2.0     2.0    10.0     0.0

where Time is a count variable / timestamp, GroupA and GroupB are categories, and Value1 and Value2 are numerical quantities. This code snippet creates a mockup dataframe:

import numpy as np
values = np.zeros(shape=(4,5))
values[:,0] = 100
values[:,1] = [1]*2 + [2]*2
values[:,2] = [1,2]*2
values[:,3] = np.random.randint(low=10,high=20,size=(4))
df = pd.DataFrame(values,columns=['Time','GroupA','GroupB','Value1','Value2'])

After loading in some data, I want to calculate and fill in values of Value2. As it happens (since, incidentally, Value2 is a time series function of Value1 within each existing (GroupA, GroupB) pair), I found it easiest to calculate these values by first pivoting my data into the form:

df_pivot = df.pivot_table(index='Time',columns=['GroupA','GroupB'],values=['Value1','Value2'], fill_value=0.0)

Then after some unrelated code I have filled in values

print(df_pivot)
       Value1             Value2            
GroupA    1.0     2.0        1.0     2.0    
GroupB    1.0 2.0 1.0 2.0    1.0 2.0 1.0 2.0
Time                                        
100.0      13  16  16  10     27  20  28  20

Now I want to "unpivot" this back to the original format of df. I could do this manually by looping over df, looking up the value in df_pivot, and filling it, but I'd prefer to use built-in functions. Try as I might using variations of df.melt, I cannot perform this inversion, because of problems with df_pivot's hierarchical columns. My best attempt is

dfm = df_pivot.reset_index().melt(id_vars="Time")
dfm.columns.values[1] = "HACK"
dfm = dfm.pivot_table(index=["Time","GroupA","GroupB"],columns="HACK",values="value").reset_index()

which produces the data frame

print(dfm)
HACK   Time  GroupA  GroupB  Value1  Value2
0     100.0     1.0     1.0      13      27
1     100.0     1.0     2.0      16      20
2     100.0     2.0     1.0      16      28
3     100.0     2.0     2.0      10      20

This works, but doesn't strike me as be best solution, or very portable (why does melt produce a "NaN" column name? why do I have a manually find the index of this column and rename it? why do I have to pivot to undo a pivot?) Experimenting and looking through documentation and examples for an alternative, I'm at a loss, though. The melt function has a col_level argument that looks like it should help, but any valid value I use for this just leads to data loss (losing the "Time", "GroupA", or "GroupB" data).

350

asked Sep 10 '18 17:09

jwimberley

Video Answer

1 Answers

I think stack is more straightforward

df_pivot.stack([1,2]).reset_index()
Out[8]: 
    Time  GroupA  GroupB  Value1  Value2
0  100.0     1.0     1.0      13       0
1  100.0     1.0     2.0      13       0
2  100.0     2.0     1.0      12       0
3  100.0     2.0     2.0      11       0

154

answered Nov 15 '22 08:11

BENY

Related questions
                            
                                pyodbc/sqlAchemy enable fast execute many
                            
                                Save Python data-frame as Table in Teradata
                            
                                what is the use case of statsd gauge?
                            
                                Python imaplib search email with date and time
                            
                                Redirecting Python's console output to Dash
                            
                                Pandas Merge and create a multi-index for duplicate columns
                            
                                Embedding CPython: how do you constuct Python callables to wrap C callback pointers?
                            
                                Werkzeug on Python 3 raises "< not supported between instances of str and int"
                            
                                Change in behaviour of dataclasses
                            
                                Reorder Multiindex Pandas Dataframe
                            
                                How to display weights and bias of the model on Tensorboard using python
                            
                                ReduceLROnPlateau gives error with ADAM optimizer
                            
                                BigQuery - Best way to DROP date-sharded tables
                            
                                Tensorflow adam optimizer in Keras
                            
                                How to use io to generate in memory data streams as file like objects?
                            
                                Access axes object in seaborn lmplot [duplicate]
                            
                                Fuzzy Match columns of Different Dataframe
                            
                                Hash value for directed acyclic graph
                            
                                In python, sorting on date field, field may sometimes be null
                            
                                How can I flatten lists without splitting strings?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Exact inverse of pandas' "pivot" operation

Tags:

python

pandas

pivot

melt

jwimberley

People also ask

Video Answer

1 Answers

BENY

Recent Activity

Donate For Us