Rolling conditional count in R

Question

I'm wanting to create a rolling function that conditionally counts the occurance of two columns in previous rows.

As an example, I have a dataset that looks like the following.

# Generate data
set.seed(123)
test <- data.frame(
  Round = rep(1:5, times = 3),
  Team = rep(c("Team 1", "Team 2", "Team 3"), each = 5),
  Venue = sample(sample(c("Venue A", "Venue B"), 15, replace = T))
)

   Round   Team   Venue
1      1 Team 1 Venue B
2      2 Team 1 Venue B
3      3 Team 1 Venue A
4      4 Team 1 Venue A
5      5 Team 1 Venue B
6      1 Team 2 Venue B
7      2 Team 2 Venue B
8      3 Team 2 Venue A
9      4 Team 2 Venue A
10     5 Team 2 Venue A
11     1 Team 3 Venue B
12     2 Team 3 Venue A
13     3 Team 3 Venue B
14     4 Team 3 Venue B
15     5 Team 3 Venue B

I want a new column that shows for each row, the number of times the team in that row has played at the venue in that row in the last 3 rounds.

I can do this quite easily with a for loop.

window <- 3

for (i in 1:nrow(dat)){
  # Create index to search (if i is less than window, start at 1)
  index <- max(i - window, 1):i

  # Search when current row matches both team and venue
  dat$VenueCount[i] <- sum(dat$Team[i] == dat$Team[index] & dat$Venue[i] == dat$Venue[index])
}

   Round   Team   Venue VenueCount
1      1 Team 1 Venue B          1
2      2 Team 1 Venue B          2
3      3 Team 1 Venue A          1
4      4 Team 1 Venue A          2
5      5 Team 1 Venue B          2
6      1 Team 2 Venue B          1
7      2 Team 2 Venue B          2
8      3 Team 2 Venue A          1
9      4 Team 2 Venue A          2
10     5 Team 2 Venue A          3
11     1 Team 3 Venue B          1
12     2 Team 3 Venue A          1
13     3 Team 3 Venue B          2
14     4 Team 3 Venue B          3
15     5 Team 3 Venue B          3

However, I'm wanting to avoid a for loop (mostly as my actual dataset is relatively large at around ~30k rows). I'm thinking it should be doable with one of zoo, dplyr, purrr or apply but haven't been able to work it out.

Thanks

chinsoon12 · Accepted Answer

venturing a data.table solution here. Will take it down if you are only looking for dplyr solution

You can roll using a window of size 4, then count the number of occurrence matching the latest row.

library(data.table)
library(zoo)
setDT(test)
winsize <- 4
test[, .(Round, 
        Venue, 
        VenueCount=rollapplyr(c(rep("", winsize-1), Venue), winsize, 
            function(x) sum(x==last(x)))), 
    by=.(Team)]

result:

#       Team Round   Venue VenueCount
#  1: Team 1     1 Venue B          1
#  2: Team 1     2 Venue B          2
#  3: Team 1     3 Venue A          1
#  4: Team 1     4 Venue A          2
#  5: Team 1     5 Venue B          2
#  6: Team 2     1 Venue B          1
#  7: Team 2     2 Venue B          2
#  8: Team 2     3 Venue A          1
#  9: Team 2     4 Venue A          2
# 10: Team 2     5 Venue A          3
# 11: Team 3     1 Venue B          1
# 12: Team 3     2 Venue A          1
# 13: Team 3     3 Venue B          2
# 14: Team 3     4 Venue B          3
# 15: Team 3     5 Venue B          3

jimmyday87 · Answer

I actually worked out an answer using rollify from the tibbletime package with dplyr::mutate. Will post here but still open to other responses!

library(dplyr)
library(tibbletime)

# Create data
set.seed(123)
test <- data.frame(
  Round = rep(1:5, times = 3),
  Team = rep(c("Team 1", "Team 2", "Team 3"), each = 5),
  Venue = sample(sample(c("Venue A", "Venue B"), 15, replace = T))
)

Use rollify to create custom function.

last_n_games = 3
count_games <- rollify(function(x) sum(last(x) == x), window = last_n_games)

Now use mutate to run the function. This returns NA for the first 2 rows (i.e. last_n_games - 1). I can then use group_by and row_number to count these first occurrences

test <- test %>%
  group_by(Team) %>%
  mutate(VenueCount = count_games(Venue)) %>%
  group_by(Team, Venue) %>%
  mutate(VenueCount = ifelse(is.na(VenueCount), row_number(Team), VenueCount))

This returns the following

# A tibble: 15 x 4
# Groups:   Team, Venue [6]
   Round Team   Venue   VenueCount
   <int> <fct>  <fct>        <int>
 1     1 Team 1 Venue B          1
 2     2 Team 1 Venue B          2
 3     3 Team 1 Venue A          1
 4     4 Team 1 Venue A          2
 5     5 Team 1 Venue B          1
 6     1 Team 2 Venue B          1
 7     2 Team 2 Venue B          2
 8     3 Team 2 Venue A          1
 9     4 Team 2 Venue A          2
10     5 Team 2 Venue A          3
11     1 Team 3 Venue B          1
12     2 Team 3 Venue A          1
13     3 Team 3 Venue B          2
14     4 Team 3 Venue B          2
15     5 Team 3 Venue B          3

Rolling conditional count in R

Tags:

r

dplyr

jimmyday87

2 Answers

chinsoon12

jimmyday87

Recent Activity

Donate For Us

Rolling conditional count in R

Tags:

r

dplyr

jimmyday87

2 Answers

chinsoon12

jimmyday87

Related questions

Recent Activity

Donate For Us