Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R: cummean() over subset

Tags:

r

I'm new to R and also stackoverflow, so pls bear with me if I'm doing something wrong here...

So I'm working on soccer data, which looks like this:

  Div     Date      HomeTeam   AwayTeam FTHG FTAG avgHG_league avgHG_team
1  D1 14/08/15 Bayern Munich    Hamburg    5    0           NA          0
2  D1 15/08/15      Augsburg     Hertha    0    1     5.000000          0
3  D1 15/08/15     Darmstadt   Hannover    2    2     2.500000          0
4  D1 15/08/15      Dortmund M'gladbach    4    0     2.333333          0
5  D1 15/08/15    Leverkusen Hoffenheim    2    1     2.750000          0
6  D1 15/08/15         Mainz Ingolstadt    0    1     2.600000          0

I've created the avgHG_league column to give me the average goals that teams at home scored so far in the season, with the following code:

BLfiltered <- BLfiltered %>%
  mutate(avgHG_league = lag(cummean(FTHG),1))

Now in the avgHG_team column I want to do pretty much the same, but instead of calculating the mean of goals of all home teams together, I only want to calculate the mean of goals that only this particular hometeam has scored at home so far in the season (but not including this game)...

Do you have any ideas?

Thanks!

/E: column "FTHG" gives us the home goals from each match

like image 767
Konninger Avatar asked Oct 30 '22 01:10

Konninger


1 Answers

Here is my solution. It uses the dplyr package which I assume you are already using since you are calling cummean in your example. For simplicity I have called the data sd for soccer data.

sd = mutate(sd,avgHG_league=lag(cummean(FTHG),1,0)) %.% group_by(HomeTeam) %.% mutate(avgHG_Team=lag(cummean(FTHG),1,0)) %.% ungroup()

Note: Adding the 0 in the lag statments (technically default=0) places 0 instead of NA for your initial values which I believe is what you want.

Results from the following dummy data

Data

    Div     Date      HomeTeam   AwayTeam FTHG FTAG
1   D1 14/08/15 Bayern Munich    Hamburg    5    0
2   D1 15/08/15      Augsburg     Hertha    0    1
3   D1 15/08/15     Darmstadt   Hannover    2    2
4   D1 15/08/15      Dortmund M'gladbach    4    0
5   D1 15/08/15    Leverkusen Hoffenheim    2    1
6   D1 15/08/15         Mainz Ingolstadt    0    1
7   D1 15/09/15 Bayern Munich    Hamburg    0    0
8   D1 15/10/15      Augsburg     Hertha    0    0
9   D1 15/10/15     Darmstadt   Hannover    0    0
10  D1 15/10/15      Dortmund M'gladbach    0    0
11  D1 15/10/15    Leverkusen Hoffenheim    0    0
12  D1 15/10/15         Mainz Ingolstadt    0    0
13  D1 15/11/15 Bayern Munich    Hamburg    0    0
14  D1 15/10/16      Augsburg     Hertha    0    0
15  D1 15/11/16     Darmstadt   Hannover    0    0
16  D1 15/10/17      Dortmund M'gladbach    0    0
17  D1 15/11/17    Leverkusen Hoffenheim    0    0
18  D1 15/10/18         Mainz Ingolstadt    0    0

Results

    Div   Date        HomeTeam   AwayTeam FTHG FTAG avgHG_league avgHG_Team
1   D1 14/08/15 Bayern Munich    Hamburg    5    0    0.0000000        0.0
2   D1 15/08/15      Augsburg     Hertha    0    1    5.0000000        0.0
3   D1 15/08/15     Darmstadt   Hannover    2    2    2.5000000        0.0
4   D1 15/08/15      Dortmund M'gladbach    4    0    2.3333333        0.0
5   D1 15/08/15    Leverkusen Hoffenheim    2    1    2.7500000        0.0
6   D1 15/08/15         Mainz Ingolstadt    0    1    2.6000000        0.0
7   D1 15/09/15 Bayern Munich    Hamburg    0    0    2.1666667        5.0
8   D1 15/10/15      Augsburg     Hertha    0    0    1.8571429        0.0
9   D1 15/10/15     Darmstadt   Hannover    0    0    1.6250000        2.0
10  D1 15/10/15      Dortmund M'gladbach    0    0    1.4444444        4.0
11  D1 15/10/15    Leverkusen Hoffenheim    0    0    1.3000000        2.0
12  D1 15/10/15         Mainz Ingolstadt    0    0    1.1818182        0.0
13  D1 15/11/15 Bayern Munich    Hamburg    0    0    1.0833333        2.5
14  D1 15/10/16      Augsburg     Hertha    0    0    1.0000000        0.0
15  D1 15/11/16     Darmstadt   Hannover    0    0    0.9285714        1.0
16  D1 15/10/17      Dortmund M'gladbach    0    0    0.8666667        2.0
17  D1 15/11/17    Leverkusen Hoffenheim    0    0    0.8125000        1.0
18  D1 15/10/18         Mainz Ingolstadt    0    0    0.7647059        0.0

Sources

A similar data processing approach can be seen here: https://blog.rstudio.org/2014/01/17/introducing-dplyr/

This is also a great resource for data manipulation in R using tidyr and dplyr: https://www.rstudio.com/wp-content/uploads/2015/02/data-wrangling-cheatsheet.pdf

like image 61
Cole Avatar answered Nov 08 '22 15:11

Cole