Hello and thanks in advance for any help. I have a simple dataframe with two columns. I did not set an index explicitly, but I believe a dataframe gets an integer index that I see along the left side of the output. Question below:
df = pandas.DataFrame(res)
df.columns = ['date', 'pb']
df['date'] = pandas.to_datetime(df['date'])
df.dtypes
date datetime64[ns]
pb float64
dtype: object
date pb
0 2016-04-01 24199.933333
1 2016-03-01 23860.870968
2 2016-02-01 23862.275862
3 2016-01-01 25049.193548
4 2015-12-01 24882.419355
5 2015-11-01 24577.000000
date datetime64[ns]
pb float64
dtype: object
I would like to pivot the dataframe so that I have years across the top (columns): 2016, 2015, etc and a row for each month: 1 - 12.
aggfunc : It is an aggregation function and we can set this param with a list of functions, dict, default is numpy. mean. If it is set to a list of functions, the resulting pivot table forms a hierarchical column and this list of functions will be a top-level column.
Return a copy of the array collapsed into one dimension. Whether to flatten in C (row-major), Fortran (column-major) order, or preserve the C/Fortran ordering from a . The default is 'C'.
Pandas' time series tools are very useful when data is timestamped. Timestamp is the pandas equivalent of python's Datetime. It's the type used for the entries that make up a DatetimeIndex, and other timeseries-oriented data structures in pandas.
The pivot() function is used to reshaped a given DataFrame organized by given index / column values. This function does not support data aggregation, multiple values will result in a MultiIndex in the columns.
Using the .dt accessor
you can create columns for year and month and then pivot on those:
df['Year'] = df['date'].dt.year
df['Month'] = df['date'].dt.month
pd.pivot_table(df,index='Month',columns='Year',values='pb',aggfunc=np.sum)
Alternately if you don't want those other columns you can do:
pd.pivot_table(df,index=df['date'].dt.month,columns=df['date'].dt.year,
values='pb',aggfunc=np.sum)
With my dummy dataset that produces:
Year 2013 2014 2015 2016
date
1 92924.0 102072.0 134660.0 132464.0
2 79935.0 82145.0 118234.0 147523.0
3 86878.0 94959.0 130520.0 138325.0
4 80267.0 89394.0 120739.0 129002.0
5 79283.0 91205.0 118904.0 125878.0
6 77828.0 89884.0 112488.0 121953.0
7 78839.0 94407.0 113124.0 NaN
8 79885.0 97513.0 116771.0 NaN
9 79455.0 99555.0 114833.0 NaN
10 77616.0 98764.0 115872.0 NaN
11 75043.0 95756.0 107123.0 NaN
12 81996.0 102637.0 114952.0 NaN
Using stack
instead of pivot
df = pd.DataFrame(
dict(date=pd.date_range('2013-01-01', periods=42, freq='M'),
pb=np.random.rand(42)))
df.set_index([df.date.dt.month, df.date.dt.year]).pb.unstack()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With