Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pass a variable to dplyr::count in a loop

Tags:

r

dplyr

I’m trying to run dplyr::count() on an arbitrary set of variables in one dataset. If I manually run count() once for each variable, I get the expected results. But when I try to put count() in a for loop to run it automatically for each variable in a set of variables, I got an error. It seems like the problem is in how I am passing the variable to count() within the for loop. I know that count() takes its variables unquoted, and for whatever reason R cannot tell that what I am passing it is a variable.

I’ve tried a number of things to fix this, including passing the variables as data$var1, quo(var1), enquo(var1), var1, “var1”, quo(data$var1), and enquo(data$var1) as well as unquoting the iterator with !!. I also tried specifying the arguments to count() like count(x=data, var=i), but this caused count() to return the total number of rows in data as the count for each iteration. If you have any ideas about what is causing the error or how I can fix it, I would very much appreciate hearing them!

Here is a minimal reproducible example that relies on the lakers dataset included with lubridate.

# This code requires some of the packages in tidyverse. 
library(dplyr)
library(lubridate)


# results = empty data frame for filling with info from the count() command
results <- data.frame()

# mydata = the source data
myData <- lakers

# myCols = list of the names of columns I want to count()
myCols <- c("opponent", "game_type", "player", "period")


# Loop to count() every column in myCols automatically and store the results in 
# one giant tibble of vars (var) and counts (n)

for(i in myCols){
results <- bind_rows(results, count(x=myData, i))
}
like image 754
jozimck Avatar asked Sep 15 '17 16:09

jozimck


People also ask

How do you count with dplyr?

count() lets you quickly count the unique values of one or more variables: df %>% count(a, b) is roughly equivalent to df %>% group_by(a, b) %>% summarise(n = n()) .

Does dplyr include Tidyr?

dplyr is a package for making tabular data wrangling easier by using a limited set of functions that can be combined to extract and summarize insights from your data. It pairs nicely with tidyr which enables you to swiftly convert between different data formats (long vs. wide) for plotting and analysis.

What dplyr function do you use to pick observations by their values?

6.4 dplyr basics filter() : pick observations by their values. select() : pick variables by their names. mutate() : create new variables with functions of existing variables. summarise() : collapse many values down to a single summary.


2 Answers

This works:

myData[myCols] %>% tidyr::gather(var, value) %>% count(var, value)

# A tibble: 407 x 3
         var value     n
       <chr> <chr> <int>
 1 game_type  away 17153
 2 game_type  home 17471
 3  opponent   ATL   904
 4  opponent   BOS   886
 5  opponent   CHA   412
 6  opponent   CHI   964
 7  opponent   CLE   822
 8  opponent   DAL  1333
 9  opponent   DEN  1855
10  opponent   DET   845
# ... with 397 more rows

If you want to pass myCols in a tibbledish manner, you'll have to look up the rlang package.

like image 79
Frank Avatar answered Oct 14 '22 22:10

Frank


From :https://github.com/tidyverse/dplyr/blob/master/vignettes/programming.Rmd

If you have a character vector of variable names, and want to operate on them with a for loop, index into the special .data pronoun:

for (var in names(mtcars)) {
  mtcars %>% count(.data[[var]]) %>% print()
}
like image 26
sigia Avatar answered Oct 14 '22 22:10

sigia