I’m trying to run dplyr::count()
on an arbitrary set of variables in one dataset. If I manually run count()
once for each variable, I get the expected results. But when I try to put count()
in a for loop to run it automatically for each variable in a set of variables, I got an error. It seems like the problem is in how I am passing the variable to count()
within the for loop. I know that count()
takes its variables unquoted, and for whatever reason R cannot tell that what I am passing it is a variable.
I’ve tried a number of things to fix this, including passing the variables as data$var1
, quo(var1)
, enquo(var1)
, var1
, “var1”
, quo(data$var1)
, and enquo(data$var1)
as well as unquoting the iterator with !!
. I also tried specifying the arguments to count()
like count(x=data, var=i)
, but this caused count()
to return the total number of rows in data as the count for each iteration. If you have any ideas about what is causing the error or how I can fix it, I would very much appreciate hearing them!
Here is a minimal reproducible example that relies on the lakers
dataset included with lubridate
.
# This code requires some of the packages in tidyverse.
library(dplyr)
library(lubridate)
# results = empty data frame for filling with info from the count() command
results <- data.frame()
# mydata = the source data
myData <- lakers
# myCols = list of the names of columns I want to count()
myCols <- c("opponent", "game_type", "player", "period")
# Loop to count() every column in myCols automatically and store the results in
# one giant tibble of vars (var) and counts (n)
for(i in myCols){
results <- bind_rows(results, count(x=myData, i))
}
count() lets you quickly count the unique values of one or more variables: df %>% count(a, b) is roughly equivalent to df %>% group_by(a, b) %>% summarise(n = n()) .
dplyr is a package for making tabular data wrangling easier by using a limited set of functions that can be combined to extract and summarize insights from your data. It pairs nicely with tidyr which enables you to swiftly convert between different data formats (long vs. wide) for plotting and analysis.
6.4 dplyr basics filter() : pick observations by their values. select() : pick variables by their names. mutate() : create new variables with functions of existing variables. summarise() : collapse many values down to a single summary.
This works:
myData[myCols] %>% tidyr::gather(var, value) %>% count(var, value)
# A tibble: 407 x 3
var value n
<chr> <chr> <int>
1 game_type away 17153
2 game_type home 17471
3 opponent ATL 904
4 opponent BOS 886
5 opponent CHA 412
6 opponent CHI 964
7 opponent CLE 822
8 opponent DAL 1333
9 opponent DEN 1855
10 opponent DET 845
# ... with 397 more rows
If you want to pass myCols
in a tibbledish manner, you'll have to look up the rlang package.
From :https://github.com/tidyverse/dplyr/blob/master/vignettes/programming.Rmd
If you have a character vector of variable names, and want to operate on them with a for loop, index into the special .data
pronoun:
for (var in names(mtcars)) {
mtcars %>% count(.data[[var]]) %>% print()
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With