Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas Pivot Time-series by year

Tags:

python

pandas

Hello and thanks in advance for any help. I have a simple dataframe with two columns. I did not set an index explicitly, but I believe a dataframe gets an integer index that I see along the left side of the output. Question below:

df = pandas.DataFrame(res)
df.columns = ['date', 'pb']
df['date'] = pandas.to_datetime(df['date'])
df.dtypes

date    datetime64[ns]
pb             float64
dtype: object


date    pb
0   2016-04-01  24199.933333
1   2016-03-01  23860.870968
2   2016-02-01  23862.275862
3   2016-01-01  25049.193548
4   2015-12-01  24882.419355
5   2015-11-01  24577.000000


date    datetime64[ns]
pb             float64
dtype: object

I would like to pivot the dataframe so that I have years across the top (columns): 2016, 2015, etc and a row for each month: 1 - 12.

like image 524
Travis Millburn Avatar asked Jul 27 '16 19:07

Travis Millburn


People also ask

What is Aggfunc in pivot table pandas?

aggfunc : It is an aggregation function and we can set this param with a list of functions, dict, default is numpy. mean. If it is set to a list of functions, the resulting pivot table forms a hierarchical column and this list of functions will be a top-level column.

What is the flatten method in pandas?

Return a copy of the array collapsed into one dimension. Whether to flatten in C (row-major), Fortran (column-major) order, or preserve the C/Fortran ordering from a . The default is 'C'.

Is pandas good for time series?

Pandas' time series tools are very useful when data is timestamped. Timestamp is the pandas equivalent of python's Datetime. It's the type used for the entries that make up a DatetimeIndex, and other timeseries-oriented data structures in pandas.

What does .pivot do in Python?

The pivot() function is used to reshaped a given DataFrame organized by given index / column values. This function does not support data aggregation, multiple values will result in a MultiIndex in the columns.


2 Answers

Using the .dt accessor you can create columns for year and month and then pivot on those:

df['Year'] = df['date'].dt.year
df['Month'] = df['date'].dt.month
pd.pivot_table(df,index='Month',columns='Year',values='pb',aggfunc=np.sum)

Alternately if you don't want those other columns you can do:

pd.pivot_table(df,index=df['date'].dt.month,columns=df['date'].dt.year,
               values='pb',aggfunc=np.sum)

With my dummy dataset that produces:

Year             2013      2014      2015      2016
date
1             92924.0  102072.0  134660.0  132464.0
2             79935.0   82145.0  118234.0  147523.0
3             86878.0   94959.0  130520.0  138325.0
4             80267.0   89394.0  120739.0  129002.0
5             79283.0   91205.0  118904.0  125878.0
6             77828.0   89884.0  112488.0  121953.0
7             78839.0   94407.0  113124.0       NaN
8             79885.0   97513.0  116771.0       NaN
9             79455.0   99555.0  114833.0       NaN
10            77616.0   98764.0  115872.0       NaN
11            75043.0   95756.0  107123.0       NaN
12            81996.0  102637.0  114952.0       NaN
like image 177
mechanical_meat Avatar answered Sep 22 '22 18:09

mechanical_meat


Using stack instead of pivot

df = pd.DataFrame(
         dict(date=pd.date_range('2013-01-01', periods=42, freq='M'),
              pb=np.random.rand(42)))

df.set_index([df.date.dt.month, df.date.dt.year]).pb.unstack()

enter image description here

like image 32
piRSquared Avatar answered Sep 20 '22 18:09

piRSquared