Pandas

Question

I'm new to Pandas. I have a dataframe where I'm looking at Horse results. I'm trying to get a rolling mean for position finished results in a column for the last 30 days for each horse. Here's an example of two horses from the dataframe:

        Horse            Position  OR   RaceDate    Weight
125283  cookie ring             4  59.0 2016-04-25  52.272727
126134  a boy named sue         7  46.0 2016-05-31  54.090909
137654  a boy named sue         4  49.0 2017-01-25  57.727273
138434  a boy named sue         8  48.0 2017-02-04  55.909091
138865  a boy named sue         2  48.0 2017-02-10  51.363636
140720  a boy named sue         3  50.0 2017-03-10  54.545455
141387  a boy named sue         7  49.0 2017-03-22  59.545455
143850  cookie ring             11  54.0 2017-05-25 56.818182
144203  cookie ring             9  54.0 2017-06-03  50.000000

So I need to groupby each horse and then apply a rolling mean for 90 days. Which I'm doing by calling the following:

df['PositionAv90D'] = df.set_index('RaceDate').groupby('Horse').rolling("90d")['Position'].mean().reset_index()

But that is returning a data frame with 3 columns and is still indexed to the Horse. Example here:

0          a b celebration 2011-08-24       3.000000
1          a b celebration 2011-09-15       4.500000
2          a b celebration 2012-05-29       4.000000
3        a beautiful dream 2016-10-21       2.333333
4        a big sky brewing 2008-04-11       2.000000
5        a big sky brewing 2008-07-08       7.500000
6        a big sky brewing 2008-08-11      10.000000
7        a big sky brewing 2008-09-20       9.000000
8        a big sky brewing 2008-12-30       4.333333
9        a big sky brewing 2009-01-21       3.666667
10       a big sky brewing 2009-02-20       3.777778

I need a column that is indexed the same as my original dataframe.

Can you help?

S.B.G · Accepted Answer

Using set_index() will delete the original index, so use reset_index() first which will create a new column called 'index' containing your original index. Then inset of reset_index() at the end (which just creates an index 0, 1, 2...etc) use set_index('index') to go back to the original

So if you do the following, I think it will work:

df['PositionAv90D'] = df.reset_index().set_index('RaceDate').groupby('Horse').rolling("90d")['Position'].mean().set_index('index')

A simple data sample would be good to test it against, its a bit difficult to recreate from what you have given

EDIT 1:

Since you're switching indexes it's easier to split up a bit, see below, I've created some sample data which I think is similar to what you're getting at:

df = pd.DataFrame({'foo': ['A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'C'],
                   'bar': [1, 1, 1, 2, 2, 2, 3, 3, 3],
                   'baz': [11, 12, 13, 14, 15, 16, 17, 18, 19]},
                  index = [14, 15, 16, 17, 18, 19, 20, 21, 22])

df.reset_index(inplace=True)    # This gives us index 0,1,2... and a new col 'index'
df.set_index('baz', inplace=True)    # Replace with date in yours
# This next bit does the groupby and rolling, which will give a df 
# with a multi index of foo and baz, then reset_index(0) to remove the foo index level
# so that it matches the original df index so that you can add it as a new column
df['roll'] = df.groupby('foo')['bar'].rolling(3).sum().reset_index(0,drop=True)
df.reset_index(inplace=True)    # brings baz back into the df as a column
df.set_index('index', inplace=True)   # sets the index back to the original

This will give you a new column in the original df with the rolling values. In my example you will have NaN for the first 2 values in each group, since the window only starts at idx = window size. So in your case the first 89 days in each group will be NaN. You might need to add an additional step to select only the last 30 days from the resulting DataFrame

Pandas - rolling mean with groupby

Tags:

python

Niazipan

Video Answer

1 Answers

S.B.G

Recent Activity

Donate For Us