I am trying to import my data regarding the changes of price of different items. The data is kept in MySQL. I have imported the input dataframe df
in a stacked format similar to the following:
ID Type Date Price1 Price2
0001 A 2001-09-20 30 301
0002 A 2001-09-21 31 278
0003 A 2001-09-22 28 299
0004 B 2001-09-18 18 159
0005 B 2001-09-20 21 157
0006 B 2001-09-21 21 162
0007 C 2001-09-19 58 326
0008 C 2001-09-20 61 410
0009 C 2001-09-21 67 383
And, in order to perform time series analysis, I want to convert to another format similar to:
A B C
Price1 Price2 Price1 Price2 Price1 Price2
Date
2001-09-18 NULL NULL 18 159 NULL NULL
2001-09-19 NULL NULL NULL NULL 58 326
2001-09-20 30 301 21 157 61 410
2001-09-21 31 278 21 168 67 383
2001-09-22 28 299 NULL NULL NULL NULL
I have looked at this question. Both of the suggested ways were not what I want to achieve. The pandas documentation regarding pivot doesn't seems to mention anything about this either.
Create Your Own Pandas Pivot Table in 4 Steps. Download or import the data that you want to use. In the pivot_table function, specify the DataFrame you are summarizing, along with the names for the indexes, columns and values. Specify the type of calculation you want to use, such as the mean.
Pandas DataFrame: pivot() functionThe pivot() function is used to reshaped a given DataFrame organized by given index / column values. This function does not support data aggregation, multiple values will result in a MultiIndex in the columns. Column to use to make new frame's index. If None, uses existing index.
Using the Pandas pivot_table() function we can reshape the DataFrame on multiple columns in the form of an Excel pivot table. To group the data in a pivot table we will need to pass a DataFrame into this function and the multiple columns you wanted to group as an index.
melt() function is used to reshape a DataFrame from a wide to a long format. It is useful to get a DataFrame where one or more columns are identifier variables, and the other columns are unpivoted to the row axis leaving only two non-identifier columns named variable and value by default.
You can reshape by pivot
or set_index
with unstack
, but then need swaplevel
with sort_index
for expected Multiindex
in columns:
df1 = (df.drop('ID', axis=1)
.pivot('Date','Type')
.swaplevel(0,1, axis=1)
.sort_index(axis=1))
df1 = (df.drop('ID', axis=1)
.set_index(['Date','Type'])
.unstack()
.swaplevel(0,1, axis=1)
.sort_index(axis=1))
df1 = (df.set_index(['Date','Type'])[['Price1','Price2']]
.unstack()
.swaplevel(0,1, axis=1)
.sort_index(axis=1))
print (df1)
Type A B C
Price1 Price2 Price1 Price2 Price1 Price2
Date
2001-09-18 NaN NaN 18.0 159.0 NaN NaN
2001-09-19 NaN NaN NaN NaN 58.0 326.0
2001-09-20 30.0 301.0 21.0 157.0 61.0 410.0
2001-09-21 31.0 278.0 21.0 162.0 67.0 383.0
2001-09-22 28.0 299.0 NaN NaN NaN NaN
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With