New to R. Small rep of my df:
PTS_TeamHome <- c(101,87,94,110,95)
PTS_TeamAway <- c(95,89,105,111,121)
TeamHome <- c("LAL", "HOU", "SAS", "MIA", "LAL")
TeamAway <- c("IND", "LAL", "LAL", "HOU", "NOP")
df <- data.frame(cbind(TeamHome, TeamAway,PTS_TeamHome,PTS_TeamAway))
df
TeamHome TeamAway PTS_TeamHome PTS_TeamAway
LAL IND 101 95
HOU LAL 87 89
SAS LAL 94 105
MIA HOU 110 111
LAL NOP 95 121
Imagine these are the first four games of a season with 1230 games. I want to calculate the cumulative points per game (mean) at any given time for the home team and the visiting team.
The output would look like this:
TeamHome TeamAway PTS_TeamHome PTS_TeamAway HOMETEAM_AVGCUMPTS ROADTEAM_AVGCUMPTS
1 LAL IND 101 95 101 95
2 HOU LAL 87 89 87 95
3 SAS LAL 94 105 94 98.33
4 MIA HOU 110 111 110 99
5 LAL NOP 95 121 97.5 121
Note that what the formula does for the fifth game for the home team. Since the LAL is the home team it looks for how many points has LAL scored when playing at home or on the road. In this case (101 + 89 + 105 + 95) / 4 = 97.5
Here is what I tried without much success:
lst <- list()
for(i in 1:nrow(df)) lst[[i]] <- ( cumsum(df[which(df$TEAM1[1:i]==df$TEAM1[i]),df$PTS_TeamAway,0])
+ cumsum(df[which(df$TEAM2[1:i]==df$TEAM1[i]),df$PTS_TeamHome,0]) )
/ #divided by number of games
df$HOMETEAM_AVGCUMPTS <- unlist(lst)
I wanted to calculate the cumulative PTS and then the number of games to divide it by but none of this worked.
I would argue that you should restructure your data in a tidier format with two rows per game: one row for the visiting team and one row for the home team. It is much easier to work with data that is in a tidy/long format.
library(dplyr)
library(tidyr)
df %>%
mutate(game = row_number()) %>%
gather(location, team, TeamHome, TeamAway) %>%
gather(location2, points, PTS_TeamHome, PTS_TeamAway) %>%
filter(
(location == "TeamHome" & location2 == "PTS_TeamHome") |
(location == "TeamAway" & location2 == "PTS_TeamAway")
) %>%
select(-location2) %>%
arrange(game) %>%
group_by(team) %>%
mutate(run_mean_points = cummean(points))
# note that cbind() is removed.
df <- data.frame(TeamHome, TeamAway,PTS_TeamHome,PTS_TeamAway, stringsAsFactors = FALSE)
Source: local data frame [10 x 5]
Groups: team
game location team points run_mean_points
1 1 TeamHome LAL 101 101.00000
2 1 TeamAway IND 95 95.00000
3 2 TeamHome HOU 87 87.00000
4 2 TeamAway LAL 89 95.00000
5 3 TeamHome SAS 94 94.00000
6 3 TeamAway LAL 105 98.33333
7 4 TeamHome MIA 110 110.00000
8 4 TeamAway HOU 111 99.00000
9 5 TeamHome LAL 95 97.50000
10 5 TeamAway NOP 121 121.00000
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With