I have a dataframe with full year data of values on each second:
YYYY-MO-DD HH-MI-SS_SSS TEMPERATURE (C)
2016-09-30 23:59:55.923 28.63
2016-09-30 23:59:56.924 28.61
2016-09-30 23:59:57.923 28.63
... ...
2017-05-30 23:59:57.923 30.02
I want to create a new dataframe which takes each week or month of values and average them over the same hour of each day (kind of moving average but for each hour). So the result for the month case will be like this:
Date TEMPERATURE (C)
2016-09 00:00:00 28.63
2016-09 01:00:00 27.53
2016-09 02:00:00 27.44
...
2016-10 00:00:00 28.61
... ...
I'm aware of the fact that I can split the df into 12 df's for each month and use:
hour = pd.to_timedelta(df['YYYY-MO-DD HH-MI-SS_SSS'].dt.hour, unit='H')
df2 = df.groupby(hour).mean()
But I'm searching for a better and faster way.
Thanks !!
Here's an alternate method of converting your date and time columns:
df['datetime'] = pd.to_datetime(df['YYYY-MO-DD'] + ' ' + df['HH-MI-SS_SSS'])
Additionally you could groupby both week and hour to form a MultiIndex dataframe (instead of creating and managing 12 dfs):
df.groupby([df.datetime.dt.weekofyear, df.datetime.dt.hour]).mean()

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With