I have a dataset of soccer match results, and I am hoping to learn R by creating a running set of ratings similar to the World Football Elo formula. I am running into trouble with things that seem to be simple in Excel aren't exactly intuitive in R. For instance, the first 15 of 4270 observations with the necessary variables:
date t.1 t.2 m.result
1 19960406 DC SJ 0.0
2 19960413 COL KC 0.0
3 19960413 NE TB 0.0
4 19960413 CLB DC 1.0
5 19960413 LAG NYRB 1.0
6 19960414 FCD SJ 0.5
7 19960418 FCD KC 1.0
8 19960420 NE NYRB 1.0
9 19960420 DC LAG 0.0
10 19960420 CLB TB 0.0
11 19960421 COL FCD 1.0
12 19960421 SJ KC 0.5
13 19960427 CLB NYRB 1.0
14 19960427 DC NE 0.5
15 19960428 FCD TB 1.0
I want to be able to create a new variable that will be a running count of t.1 and t.2's total matches played (i.e., the instances up to the date in question that "DC" occurs in columns t.1 or t.2):
date t.1 t.2 m.result ##t.1m ##t.2m
1 19960406 DC SJ 0.0 1 1
2 19960413 COL KC 0.0 1 1
3 19960413 NE TB 0.0 1 1
4 19960413 CLB DC 1.0 1 2
5 19960413 LAG NYRB 1.0 1 1
6 19960414 FCD SJ 0.5 1 2
7 19960418 FCD KC 1.0 2 2
8 19960420 NE NYRB 1.0 2 2
9 19960420 DC LAG 0.0 3 2
10 19960420 CLB TB 0.0 2 2
11 19960421 COL FCD 1.0 2 3
12 19960421 SJ KC 0.5 3 3
13 19960427 CLB NYRB 1.0 3 3
14 19960427 DC NE 0.5 4 3
15 19960428 FCD TB 1.0 4 3
in Excel, this is a (relatively) simple =SUMPRODUCT equation, e.g:
E4=SUMPRODUCT((A:A<=A4)*(B:B=B4))+SUMPRODUCT((A:A<=A4)*(C:C=B4))
where E4 is t.1m for obs # 4, A:A is Date, B:B is t.1, C:C is t.2, etc.
But in R, I can get total sumproduct printed for me (i.e. "DC" has played 576 games across my dataset), but for some reason (probably that I'm new, impatient, rattled by trial and error) I'm just lost on how to make a running count on observation data, and especially how to make that running count into a variable, which is vital for any game rating index. I know 'PlayerRatings' exists, I feel that for my R education I should be able do this in the R suite without that package. plyr or dplyr is okay, of course.
For reference, here is my data for you to copy/paste into your R.
date<-c(19960406,19960413,19960413,19960413,19960413,19960414,19960418,19960420,19960420,19960420,19960421,19960421,19960427,19960427,19960428)
t.1<-c("DC","COL","NE","CLB","LAG","FCD","FCD","NE","DC","CLB","COL","SJ","CLB","DC","FCD")
t.2<-c("SJ","KC","TB","DC","NYRB","SJ","KC","NYRB","LAG","TB","FCD","KC","NYRB","NE","TB")
m.result<-c(0.0,0.0,0.0,1.0,1.0,0.5,1.0,1.0,0.0,0.0,1.0,0.5,1.0,0.5,1.0)
mtable<-data.frame(date,t.1,t.2,m.result)
mtable
count() lets you quickly count the unique values of one or more variables: df %>% count(a, b) is roughly equivalent to df %>% group_by(a, b) %>% summarise(n = n()) . count() is paired with tally() , a lower-level helper that is equivalent to df %>% summarise(n = n()) .
You can use base R to create conditions and count the number of occurrences in a column. If you are an Excel user, it is similar to function COUNTIF.
You can use group_by() function along with the summarise() from dplyr package to find the group by count in R DataFrame, group_by() returns the grouped_df ( A grouped Data Frame) and use summarise() on grouped df to get the group by count. To use these functions first, you have to install dplyr first using install.
Count Observations by Group in R, want to count the number of observations by the group. Fortunately, the count() function from the dplyr library makes this simple. Using the data frame below, this tutorial shows numerous examples of how to utilize this function in practice.
In your data creation step, make sure stringsAsFactors = FALSE
to avoid issues. Then it's easy to do. (edit: I made this an all dplyr
example)
library(dplyr)
cross_count <- function(id, var) {
length(which(mtable[id, var] == mtable[1:id, ] %>% select(t.1, t.2) %>% unlist))
}
mtable %>%
arrange(date) %>% # This makes sure the dates are in order
mutate(id = 1:nrow(.)) %>%
rowwise() %>%
mutate(t.1m = cross_count(id, 2), t.2m = cross_count(id, 3))
date t.1 t.2 m.result id t.1m t.2m
1 19960406 DC SJ 0.0 1 1 1
2 19960413 COL KC 0.0 2 1 1
3 19960413 NE TB 0.0 3 1 1
4 19960413 CLB DC 1.0 4 1 2
5 19960413 LAG NYRB 1.0 5 1 1
6 19960414 FCD SJ 0.5 6 1 2
7 19960418 FCD KC 1.0 7 2 2
8 19960420 NE NYRB 1.0 8 2 2
9 19960420 DC LAG 0.0 9 3 2
10 19960420 CLB TB 0.0 10 2 2
11 19960421 COL FCD 1.0 11 2 3
12 19960421 SJ KC 0.5 12 3 3
13 19960427 CLB NYRB 1.0 13 3 3
14 19960427 DC NE 0.5 14 4 3
15 19960428 FCD TB 1.0 15 4 3
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With