I'm new to R and also stackoverflow, so pls bear with me if I'm doing something wrong here...
So I'm working on soccer data, which looks like this:
Div Date HomeTeam AwayTeam FTHG FTAG avgHG_league avgHG_team
1 D1 14/08/15 Bayern Munich Hamburg 5 0 NA 0
2 D1 15/08/15 Augsburg Hertha 0 1 5.000000 0
3 D1 15/08/15 Darmstadt Hannover 2 2 2.500000 0
4 D1 15/08/15 Dortmund M'gladbach 4 0 2.333333 0
5 D1 15/08/15 Leverkusen Hoffenheim 2 1 2.750000 0
6 D1 15/08/15 Mainz Ingolstadt 0 1 2.600000 0
I've created the avgHG_league column to give me the average goals that teams at home scored so far in the season, with the following code:
BLfiltered <- BLfiltered %>%
mutate(avgHG_league = lag(cummean(FTHG),1))
Now in the avgHG_team column I want to do pretty much the same, but instead of calculating the mean of goals of all home teams together, I only want to calculate the mean of goals that only this particular hometeam has scored at home so far in the season (but not including this game)...
Do you have any ideas?
Thanks!
/E: column "FTHG" gives us the home goals from each match
Here is my solution. It uses the dplyr package which I assume you are already using since you are calling cummean
in your example. For simplicity I have called the data sd
for soccer data.
sd = mutate(sd,avgHG_league=lag(cummean(FTHG),1,0)) %.% group_by(HomeTeam) %.% mutate(avgHG_Team=lag(cummean(FTHG),1,0)) %.% ungroup()
Note: Adding the 0
in the lag statments (technically default=0
) places 0
instead of NA
for your initial values which I believe is what you want.
Results from the following dummy data
Div Date HomeTeam AwayTeam FTHG FTAG
1 D1 14/08/15 Bayern Munich Hamburg 5 0
2 D1 15/08/15 Augsburg Hertha 0 1
3 D1 15/08/15 Darmstadt Hannover 2 2
4 D1 15/08/15 Dortmund M'gladbach 4 0
5 D1 15/08/15 Leverkusen Hoffenheim 2 1
6 D1 15/08/15 Mainz Ingolstadt 0 1
7 D1 15/09/15 Bayern Munich Hamburg 0 0
8 D1 15/10/15 Augsburg Hertha 0 0
9 D1 15/10/15 Darmstadt Hannover 0 0
10 D1 15/10/15 Dortmund M'gladbach 0 0
11 D1 15/10/15 Leverkusen Hoffenheim 0 0
12 D1 15/10/15 Mainz Ingolstadt 0 0
13 D1 15/11/15 Bayern Munich Hamburg 0 0
14 D1 15/10/16 Augsburg Hertha 0 0
15 D1 15/11/16 Darmstadt Hannover 0 0
16 D1 15/10/17 Dortmund M'gladbach 0 0
17 D1 15/11/17 Leverkusen Hoffenheim 0 0
18 D1 15/10/18 Mainz Ingolstadt 0 0
Div Date HomeTeam AwayTeam FTHG FTAG avgHG_league avgHG_Team
1 D1 14/08/15 Bayern Munich Hamburg 5 0 0.0000000 0.0
2 D1 15/08/15 Augsburg Hertha 0 1 5.0000000 0.0
3 D1 15/08/15 Darmstadt Hannover 2 2 2.5000000 0.0
4 D1 15/08/15 Dortmund M'gladbach 4 0 2.3333333 0.0
5 D1 15/08/15 Leverkusen Hoffenheim 2 1 2.7500000 0.0
6 D1 15/08/15 Mainz Ingolstadt 0 1 2.6000000 0.0
7 D1 15/09/15 Bayern Munich Hamburg 0 0 2.1666667 5.0
8 D1 15/10/15 Augsburg Hertha 0 0 1.8571429 0.0
9 D1 15/10/15 Darmstadt Hannover 0 0 1.6250000 2.0
10 D1 15/10/15 Dortmund M'gladbach 0 0 1.4444444 4.0
11 D1 15/10/15 Leverkusen Hoffenheim 0 0 1.3000000 2.0
12 D1 15/10/15 Mainz Ingolstadt 0 0 1.1818182 0.0
13 D1 15/11/15 Bayern Munich Hamburg 0 0 1.0833333 2.5
14 D1 15/10/16 Augsburg Hertha 0 0 1.0000000 0.0
15 D1 15/11/16 Darmstadt Hannover 0 0 0.9285714 1.0
16 D1 15/10/17 Dortmund M'gladbach 0 0 0.8666667 2.0
17 D1 15/11/17 Leverkusen Hoffenheim 0 0 0.8125000 1.0
18 D1 15/10/18 Mainz Ingolstadt 0 0 0.7647059 0.0
A similar data processing approach can be seen here: https://blog.rstudio.org/2014/01/17/introducing-dplyr/
This is also a great resource for data manipulation in R using tidyr and dplyr: https://www.rstudio.com/wp-content/uploads/2015/02/data-wrangling-cheatsheet.pdf
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With