I am new to Pandas timeseries and dataframes and struggle getting this simple task done. I have a dataset "data" (1-dimensional float32-Numpy array) for each day from 1/1/2004 - 12/31/2008. The dates are stored as a list of datetime objects "dates". Basically, I would like to calculate a complete "standard year" - the average value of each day of all years (1-365). I started from this similar (?) question (Getting the average of a certain hour on weekdays over several years in a pandas dataframe), but could not get to the desired result - a time series of 365 "average" days, e.g. the average of all four 1st of January's, 2nd of January's ...
A small example script:
import numpy as np
import pandas as pd
import datetime
startdate = datetime.datetime(2004, 1, 1)
enddate = datetime.datetime(2008, 1, 1)
days = (enddate + datetime.timedelta(days=1) - startdate).days
data = np.random.random(days)
dates = [startdate + datetime.timedelta(days=x) for x in range(0, days)]
ts = pd.Series(data, dates)
test = ts.groupby(lambda x: (x.year, x.day)).mean()
Group by the month and day, rather than the year and day:
test = ts.groupby([ts.index.month, ts.index.day]).mean()
yields
1 1 0.499264
2 0.449357
3 0.498883
...
12 17 0.408180
18 0.317682
19 0.467238
...
29 0.413721
30 0.399180
31 0.828423
Length: 366, dtype: float64
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With