I have a dataframe that looks like the following:
import pandas as pd
datelisttemp = pd.date_range('1/1/2014', periods=3, freq='D')
s = list(datelisttemp)*3
s.sort()
df = pd.DataFrame({'BORDER':['GERMANY','FRANCE','ITALY','GERMANY','FRANCE','ITALY','GERMANY','FRANCE','ITALY' ], 'HOUR1':[2 ,2 ,2 ,4 ,4 ,4 ,6 ,6, 6],'HOUR2':[3 ,3 ,3, 5 ,5 ,5, 7, 7, 7], 'HOUR3':[8 ,8 ,8, 12 ,12 ,12, 99, 99, 99]}, index=s)
This gives me:
Out[458]: df
BORDER HOUR1 HOUR2 HOUR3
2014-01-01 GERMANY 2 3 8
2014-01-01 FRANCE 2 3 8
2014-01-01 ITALY 2 3 8
2014-01-02 GERMANY 4 5 12
2014-01-02 FRANCE 4 5 12
2014-01-02 ITALY 4 5 12
2014-01-03 GERMANY 6 7 99
2014-01-03 FRANCE 6 7 99
2014-01-03 ITALY 6 7 99
I want the final dataframe to look something like:
HOUR GERMANY FRANCE ITALY
2014-01-01 1 2 2 2
2014-01-01 2 3 3 3
2014-01-01 3 8 8 8
2014-01-02 1 4 4 4
2014-01-02 2 5 5 5
2014-01-02 3 12 12 12
2014-01-03 1 6 6 6
2014-01-03 2 7 7 7
2014-01-03 3 99 99 99
I've done the following but I'm not quite there:
df['date_col'] = df.index
df2 = melt(df, id_vars=['date_col','BORDER'])
#Can I keep the same index after melt or do I have to set an index like below?
df2.set_index(['date_col', 'variable'], inplace=True, drop=True)
df2 = df2.sort()
df
Out[465]: df2
BORDER value
date_col variable
2014-01-01 HOUR1 GERMANY 2
HOUR1 FRANCE 2
HOUR1 ITALY 2
HOUR2 GERMANY 3
HOUR2 FRANCE 3
HOUR2 ITALY 3
HOUR3 GERMANY 8
HOUR3 FRANCE 8
HOUR3 ITALY 8
2014-01-02 HOUR1 GERMANY 4
HOUR1 FRANCE 4
HOUR1 ITALY 4
HOUR2 GERMANY 5
HOUR2 FRANCE 5
HOUR2 ITALY 5
HOUR3 GERMANY 12
HOUR3 FRANCE 12
HOUR3 ITALY 12
2014-01-03 HOUR1 GERMANY 6
HOUR1 FRANCE 6
HOUR1 ITALY 6
HOUR2 GERMANY 7
HOUR2 FRANCE 7
HOUR2 ITALY 7
HOUR3 GERMANY 99
HOUR3 FRANCE 99
HOUR3 ITALY 99
I thought I could unstack df2 to get something that resembles my final dataframe but I get all sorts of errors. I have also tried to pivot this dataframe but can't quite get what I want.
Pandas provides various built-in methods for reshaping DataFrame. Among them, stack() and unstack() are the 2 most popular methods for restructuring columns and rows (also known as index). stack() : stack the prescribed level(s) from column to row. unstack() : unstack the prescribed level(s) from row to column.
Pandas Unstack is a function that pivots the level of the indexed columns in a stacked dataframe. A stacked dataframe is usually a result of an aggregated groupby function in pandas.
Pandas DataFrame: pivot() functionThe pivot() function is used to reshaped a given DataFrame organized by given index / column values. This function does not support data aggregation, multiple values will result in a MultiIndex in the columns. Column to use to make new frame's index. If None, uses existing index.
Pandas unstack function to get data in wide form When you groupby multiple variables, by default the last level will be on the rows in the wide form. If we want wide form data, but with different variable on column, we can specify the level or variable name to unstack() function.
We want values (e.g. 'GERMANY'
) to become column names, and column names (e.g. 'HOUR1'
) to become values -- a swap of sorts.
The stack
method turns column names into index values, and
the unstack
method turns index values into column names.
So by shifting the values into the index, we can use stack
and unstack
to perform the swap.
import pandas as pd
datelisttemp = pd.date_range('1/1/2014', periods=3, freq='D')
s = list(datelisttemp)*3
s.sort()
df = pd.DataFrame({'BORDER':['GERMANY','FRANCE','ITALY','GERMANY','FRANCE','ITALY','GERMANY','FRANCE','ITALY' ], 'HOUR1':[2 ,2 ,2 ,4 ,4 ,4 ,6 ,6, 6],'HOUR2':[3 ,3 ,3, 5 ,5 ,5, 7, 7, 7], 'HOUR3':[8 ,8 ,8, 12 ,12 ,12, 99, 99, 99]}, index=s)
df = df.set_index(['BORDER'], append=True)
df.columns.name = 'HOUR'
df = df.unstack('BORDER')
df = df.stack('HOUR')
df = df.reset_index('HOUR')
df['HOUR'] = df['HOUR'].str.replace('HOUR', '').astype('int')
print(df)
yields
BORDER HOUR FRANCE GERMANY ITALY
2014-01-01 1 2 2 2
2014-01-01 2 3 3 3
2014-01-01 3 8 8 8
2014-01-02 1 4 4 4
2014-01-02 2 5 5 5
2014-01-02 3 12 12 12
2014-01-03 1 6 6 6
2014-01-03 2 7 7 7
2014-01-03 3 99 99 99
Using your df2
:
>>> df2.pivot_table(values='value', index=['DATE', 'variable'], columns="BORDER")
BORDER FRANCE GERMANY ITALY
DATE variable
2014-01-01 HOUR1 2 2 2
HOUR2 3 3 3
HOUR3 8 8 8
2014-01-02 HOUR1 4 4 4
HOUR2 5 5 5
HOUR3 12 12 12
2014-01-03 HOUR1 6 6 6
HOUR2 7 7 7
HOUR3 99 99 99
[9 rows x 3 columns]
There is still a bit of cleanup to do if you want to convert the index level "variable" into a column called "HOUR" and strip out the text "HOUR" from the values, but I think that is the basic format you want.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With