how to unstack (or pivot?) in pandas

Tags:

I have a dataframe that looks like the following:

import pandas as pd
datelisttemp = pd.date_range('1/1/2014', periods=3, freq='D')
s = list(datelisttemp)*3
s.sort()
df = pd.DataFrame({'BORDER':['GERMANY','FRANCE','ITALY','GERMANY','FRANCE','ITALY','GERMANY','FRANCE','ITALY' ], 'HOUR1':[2 ,2 ,2 ,4 ,4 ,4 ,6 ,6, 6],'HOUR2':[3 ,3 ,3, 5 ,5 ,5, 7, 7, 7], 'HOUR3':[8 ,8 ,8, 12 ,12 ,12, 99, 99, 99]}, index=s)

This gives me:

Click to copy

Out[458]: df

             BORDER  HOUR1  HOUR2  HOUR3
2014-01-01  GERMANY      2      3      8
2014-01-01   FRANCE      2      3      8
2014-01-01    ITALY      2      3      8
2014-01-02  GERMANY      4      5     12
2014-01-02   FRANCE      4      5     12
2014-01-02    ITALY      4      5     12
2014-01-03  GERMANY      6      7     99
2014-01-03   FRANCE      6      7     99
2014-01-03    ITALY      6      7     99

I want the final dataframe to look something like:

Click to copy

             HOUR  GERMANY  FRANCE  ITALY
2014-01-01   1     2        2       2     
2014-01-01   2     3        3       3
2014-01-01   3     8        8       8 
2014-01-02   1     4        4       4
2014-01-02   2     5        5       5
2014-01-02   3    12       12      12
2014-01-03   1     6        6       6
2014-01-03   2     7        7       7
2014-01-03   3    99       99      99

I've done the following but I'm not quite there:

Click to copy

df['date_col'] = df.index

df2 = melt(df, id_vars=['date_col','BORDER'])  
#Can I keep the same index after melt or do I have to set an index like below?
df2.set_index(['date_col', 'variable'], inplace=True, drop=True)
df2 = df2.sort()

Click to copy

Out[465]: df2

                         BORDER   value
date_col   variable                 
2014-01-01 HOUR1           GERMANY   2
           HOUR1           FRANCE    2
           HOUR1           ITALY     2
           HOUR2           GERMANY   3
           HOUR2           FRANCE    3
           HOUR2           ITALY     3
           HOUR3           GERMANY   8
           HOUR3           FRANCE    8
           HOUR3           ITALY     8
2014-01-02 HOUR1           GERMANY   4
           HOUR1           FRANCE    4
           HOUR1           ITALY     4
           HOUR2           GERMANY   5
           HOUR2           FRANCE    5
           HOUR2           ITALY     5
           HOUR3           GERMANY  12
           HOUR3           FRANCE   12
           HOUR3           ITALY    12
2014-01-03 HOUR1           GERMANY   6
           HOUR1           FRANCE    6
           HOUR1           ITALY     6
           HOUR2           GERMANY   7
           HOUR2           FRANCE    7
           HOUR2           ITALY     7
           HOUR3           GERMANY  99
           HOUR3           FRANCE   99
           HOUR3           ITALY    99

I thought I could unstack df2 to get something that resembles my final dataframe but I get all sorts of errors. I have also tried to pivot this dataframe but can't quite get what I want.

868

asked Jul 08 '14 19:07

2 Answers

We want values (e.g. 'GERMANY') to become column names, and column names (e.g. 'HOUR1') to become values -- a swap of sorts.

The stack method turns column names into index values, and the unstack method turns index values into column names.

So by shifting the values into the index, we can use stack and unstack to perform the swap.

Click to copy

import pandas as pd

datelisttemp = pd.date_range('1/1/2014', periods=3, freq='D')
s = list(datelisttemp)*3
s.sort()
df = pd.DataFrame({'BORDER':['GERMANY','FRANCE','ITALY','GERMANY','FRANCE','ITALY','GERMANY','FRANCE','ITALY' ], 'HOUR1':[2 ,2 ,2 ,4 ,4 ,4 ,6 ,6, 6],'HOUR2':[3 ,3 ,3, 5 ,5 ,5, 7, 7, 7], 'HOUR3':[8 ,8 ,8, 12 ,12 ,12, 99, 99, 99]}, index=s)

df = df.set_index(['BORDER'], append=True)
df.columns.name = 'HOUR'
df = df.unstack('BORDER')
df = df.stack('HOUR')
df = df.reset_index('HOUR')
df['HOUR'] = df['HOUR'].str.replace('HOUR', '').astype('int')
print(df)

yields

Click to copy

BORDER      HOUR  FRANCE  GERMANY  ITALY
2014-01-01     1       2        2      2
2014-01-01     2       3        3      3
2014-01-01     3       8        8      8
2014-01-02     1       4        4      4
2014-01-02     2       5        5      5
2014-01-02     3      12       12     12
2014-01-03     1       6        6      6
2014-01-03     2       7        7      7
2014-01-03     3      99       99     99

answered Oct 03 '22 22:10

unutbu

Using your df2:

Click to copy

>>> df2.pivot_table(values='value', index=['DATE', 'variable'], columns="BORDER")
BORDER               FRANCE  GERMANY  ITALY
DATE       variable                        
2014-01-01 HOUR1          2        2      2
           HOUR2          3        3      3
           HOUR3          8        8      8
2014-01-02 HOUR1          4        4      4
           HOUR2          5        5      5
           HOUR3         12       12     12
2014-01-03 HOUR1          6        6      6
           HOUR2          7        7      7
           HOUR3         99       99     99

[9 rows x 3 columns]

There is still a bit of cleanup to do if you want to convert the index level "variable" into a column called "HOUR" and strip out the text "HOUR" from the values, but I think that is the basic format you want.

answered Oct 03 '22 20:10

BrenBarn

Related questions
                            
                                What is a good Django workflow?
                            
                                Pyramid project structure
                            
                                Is Dive Into Python outdated? [closed]
                            
                                Numpy and 16-bit PGM
                            
                                run python source code line by line
                            
                                How to check whether virtualenv was created with '--no-site-packages'?
                            
                                Using Python's Format Specification Mini-Language to align floats
                            
                                Check if an open file has been deleted after open in python
                            
                                How do I call a model method in django ModelAdmin fieldsets?
                            
                                SocketServer: getting rid of '[Errno 98] Address already in use'
                            
                                Search within tags with BeautifulSoup Python
                            
                                Gaussian Smoothing an image in python
                            
                                Find minimum distance from point to complicated curve
                            
                                PySide + QTableView example
                            
                                Python: Can a function return an array and a variable?
                            
                                Flask blueprint unit-testing
                            
                                Sphinx apidoc section titles for Python module/package names
                            
                                Is Python incorrectly handling this "arbitrary precision integer"?
                            
                                C array vs NumPy array
                            
                                python NameError: name '__file__' is not defined [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

how to unstack (or pivot?) in pandas

Tags:

python

stack

pandas

pivot

codingknob

People also ask

2 Answers

unutbu

BrenBarn

Recent Activity

Donate For Us