To get the monthly average values of a Data Frame when the DataFrame has daily data rows 'Sentiment', I would: Convert the column with the dates , df['dates'] into the index of the DataFrame df : df. set_index('date',inplace=True) Then I'll convert the index dates into a month-index: df.
The Hello, World! of pandas GroupBy You call . groupby() and pass the name of the column that you want to group on, which is "state" . Then, you use ["last_name"] to specify the columns on which you want to perform the actual aggregation. You can pass a lot more than just a single column name to .
Managed to do it:
b = pd.read_csv('b.dat')
b.index = pd.to_datetime(b['date'],format='%m/%d/%y %I:%M%p')
b.groupby(by=[b.index.month, b.index.year])
Or
b.groupby(pd.Grouper(freq='M')) # update for v0.21+
(update: 2018)
Note that pd.Timegrouper
is depreciated and will be removed. Use instead:
df.groupby(pd.Grouper(freq='M'))
One solution which avoids MultiIndex is to create a new datetime
column setting day = 1. Then group by this column.
df = pd.DataFrame({'Date': pd.to_datetime(['2017-10-05', '2017-10-20', '2017-10-01', '2017-09-01']),
'Values': [5, 10, 15, 20]})
# normalize day to beginning of month, 4 alternative methods below
df['YearMonth'] = df['Date'] + pd.offsets.MonthEnd(-1) + pd.offsets.Day(1)
df['YearMonth'] = df['Date'] - pd.to_timedelta(df['Date'].dt.day-1, unit='D')
df['YearMonth'] = df['Date'].map(lambda dt: dt.replace(day=1))
df['YearMonth'] = df['Date'].dt.normalize().map(pd.tseries.offsets.MonthBegin().rollback)
Then use groupby
as normal:
g = df.groupby('YearMonth')
res = g['Values'].sum()
# YearMonth
# 2017-09-01 20
# 2017-10-01 30
# Name: Values, dtype: int64
pd.Grouper
The subtle benefit of this solution is, unlike pd.Grouper
, the grouper index is normalized to the beginning of each month rather than the end, and therefore you can easily extract groups via get_group
:
some_group = g.get_group('2017-10-01')
Calculating the last day of October is slightly more cumbersome. pd.Grouper
, as of v0.23, does support a convention
parameter, but this is only applicable for a PeriodIndex
grouper.
An alternative to the above idea is to convert to a string, e.g. convert datetime 2017-10-XX
to string '2017-10'
. However, this is not recommended since you lose all the efficiency benefits of a datetime
series (stored internally as numerical data in a contiguous memory block) versus an object
series of strings (stored as an array of pointers).
To groupby time-series data you can use the method resample
. For example, to groupby by month:
df.resample(rule='M', on='date')['Values'].sum()
The list with offset aliases you can find here.
Slightly alternative solution to @jpp's but outputting a YearMonth
string:
df['YearMonth'] = pd.to_datetime(df['Date']).apply(lambda x: '{year}-{month}'.format(year=x.year, month=x.month))
res = df.groupby('YearMonth')['Values'].sum()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With