I have a pandas dataframe where the index is the date, from year 2007 to 2017.
I'd like to calculate the mean of each weekday for each year. I am able to group by year:
groups = df.groupby(TimeGrouper('A'))
years = DataFrame()
for name, group in groups:
years[name.year] = group.values
This is the way I create a new dataframe (years) where in each column I obtain each year of the time series. If I want to see the statistics of each years (for example, the mean):
print(years.mean())
But now I would like to separate each day of the week for each year, in order to obtain the mean of each weekday for all of then.
The only thing I know is:
year=df[(df.index.year==2007)]
day_week=df[(df.index.weekday==2)]
The problem with this is that I have to change 7 times the day of the week, and then repeat this for 11 years (my time series begins on 2007 and ends on 2017), so I must do it 77 times!
Is there a way to group time by years and weekday in order to make this faster?
It seems you need groupby
by DatetimeIndex.year
with DatetimeIndex.weekday
:
rng = pd.date_range('2017-04-03', periods=10, freq='10M')
df = pd.DataFrame({'a': range(10)}, index=rng)
print (df)
a
2017-04-30 0
2018-02-28 1
2018-12-31 2
2019-10-31 3
2020-08-31 4
2021-06-30 5
2022-04-30 6
2023-02-28 7
2023-12-31 8
2024-10-31 9
df1 = df.groupby([df.index.year, df.index.weekday]).mean()
print (df1)
a
2017 6 0
2018 0 2
2 1
2019 3 3
2020 0 4
2021 2 5
2022 5 6
2023 1 7
6 8
2024 3 9
df1 = df.groupby([df.index.year, df.index.weekday]).mean().reset_index()
df1 = df1.rename(columns={'level_0':'years','level_1':'weekdays'})
print (df1)
years weekdays a
0 2017 6 0
1 2018 0 2
2 2018 2 1
3 2019 3 3
4 2020 0 4
5 2021 2 5
6 2022 5 6
7 2023 1 7
8 2023 6 8
9 2024 3 9
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With