Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find annual average of pandas dataframe with date column

Tags:

python

pandas

        id      vi       dates     f_id
0  5532714  0.549501  2015-07-07    ff_22
1  5532715  0.540969  2015-07-08    ff_22
2  5532716  0.531477  2015-07-09    ff_22
3  5532717  0.521029  2015-07-10    ff_22
4  5532718  0.509694  2015-07-11    ff_22

In the dataframe above, I want to find average yearly value for each year. This does not work:

df.groupby(df.dates.year)['vi'].transform(mean)

I get this error: *** AttributeError: 'Series' object has no attribute 'year' How to fix this?

like image 811
user308827 Avatar asked Jul 20 '17 04:07

user308827


2 Answers

Let's make sure that dates is datetime dtype, then use the .dt accessor as .dt.year:

df['dates'] = pd.to_datetime(df.dates)
df.groupby(df.dates.dt.year)['vi'].transform('mean')

Output:

0    0.530534
1    0.530534
2    0.530534
3    0.530534
4    0.530534
Name: vi, dtype: float64
like image 61
Scott Boston Avatar answered Sep 17 '22 14:09

Scott Boston


Updating and completing @piRsquared's example below for recent versions of pandas (e.g. v1.1.0), using the Grouper function instead of TimeGrouper which was deprecated:

import pandas as pd
import numpy as np
tidx = pd.date_range('2010-01-01', '2013-12-31', name='dates')
np.random.seed([3,1415])
df = pd.DataFrame(dict(vi=np.random.rand(tidx.size)), tidx)
df.groupby(pd.Grouper(freq='1Y')).mean()
like image 26
tagoma Avatar answered Sep 18 '22 14:09

tagoma