Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Call many variables in a for loop with dplyr/ggplot function

Sometimes when performing exploratory analysis or producing reports we want to plot univariate distributions for many variables. I could do this faceting the plot after some tidy trick, but there's ordered factors and I want to keep them ordered on the plots.

So, to accomplish it in a more efficient way, I built a simple dplyr/ggplot based function. I made this example below using the Arthritis dataset of vcd package.

library(dplyr)
library(ggplot2)

data(Arthritis, package = "vcd")

head(Arthritis)

plotUniCat <- function(df, x) {
  x <- enquo(x)
  df %>%
    filter(!is.na(!!x)) %>%
    count(!!x) %>%
    mutate(prop = prop.table(n)) %>%
    ggplot(aes(y=prop, x=!!x)) +
    geom_bar(stat = "identity")
}

plotUniCat(Arthritis, Improved)

I can plot a formatted graph in a very short way, which is cool, but with just one variable.

I tried to call more than one variable with a for loop, but it's not working. The code runs, but nothing happens.

variables <- c("Improved", "Sex", "Treatment")

for (i in variables) {
  plotUniCat(Arthritis, noquote(i))
}

I searched about this, but it's still not clear for me. Does someone know what I am doing wrong or how to make it work?

Thanks in advance.

like image 367
Bruno Pinheiro Avatar asked Jun 14 '18 01:06

Bruno Pinheiro


3 Answers

You need to use rlang::sym to convert strings to symbols instead of enquo. I replace for loop with purrr::map to loop through the variables

library(tidyverse)

data(Arthritis, package = "vcd")

head(Arthritis)
#>   ID Treatment  Sex Age Improved
#> 1 57   Treated Male  27     Some
#> 2 46   Treated Male  29     None
#> 3 77   Treated Male  30     None
#> 4 17   Treated Male  32   Marked
#> 5 36   Treated Male  46   Marked
#> 6 23   Treated Male  58   Marked

plotUniCat2 <- function(df, x) {
  x <- rlang::sym(x)
  df %>%
    filter(!is.na(!!x)) %>%
    count(!!x) %>%
    mutate(prop = prop.table(n)) %>%
    ggplot(aes(y=prop, x=!!x)) +
    geom_bar(stat = "identity")
}

variables <- c("Improved", "Sex", "Treatment")

variables %>% purrr::map(., ~ plotUniCat2(Arthritis, .x))
#> [[1]]

#> 
#> [[2]]

#> 
#> [[3]]

Created on 2018-06-13 by the reprex package (v0.2.0).

like image 143
Tung Avatar answered Nov 05 '22 06:11

Tung


Change the enquo in the function to sym, to convert the variable string to a symbol. That is,

plotUniCat <- function(df, x) {
  x <- sym(x)
  df %>%
    filter(!is.na(!!x)) %>%
    count(!!x) %>%
    mutate(prop = prop.table(n)) %>%
    ggplot(aes(y=prop, x=!!x)) +
    geom_bar(stat = "identity")
}

or, more concisely,

plotUniCat <- function(df, x) {
  x <- sym(x)
  df %>%
    filter(!is.na(!!x)) %>%
    ggplot(aes(x = as.factor(!!x))) +
    geom_histogram(stat = "count")
}

and then

out <- lapply(variables, function(i) plotUniCat(Arthritis,i))

Finally, use grid.arrange to display the plots. E.g.

library(gridExtra)
do.call(grid.arrange, c(out, ncol = 2))

enter image description here

like image 4
Weihuang Wong Avatar answered Nov 05 '22 05:11

Weihuang Wong


I guess the OP would like to use the plotUniCat for both quoted and unquoted variable name. If we change the function, it would not work for plotUniCat(Arthritis, Improved).

Therefore, instead of change the function, we can also change the way how we call the function plotUniCat into:

for (i in variables) {
    plotUniCat(Arthritis, !!rlang::sym(i))
}

However, the plots are generated but not returned by for. We can use print or lapply to force the display or collect the generated plots:

lapply(variables, function(i) plotUniCat(Arthritis, !!rlang::sym(i)))
like image 1
mt1022 Avatar answered Nov 05 '22 06:11

mt1022