Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

printing ggplot with purrr map

Tags:

r

ggplot2

purrr

I want to create ggplots for numeric cols against my response variable.

Here is the reproducible code:

test = mpg %>% select_if(is.numeric) %>% 
dplyr::select(-year) %>% nest(-cyl) %>% 
mutate(ggplots = map(data,~ggplot(data = .x) + geom_point(aes(x = cyl, y = .x))))

test
# A tibble: 4 x 3
    cyl           data ggplots
  <int> <list<df[,3]>> <list> 
1     4       [81 x 3] <gg>   
2     6       [79 x 3] <gg>   
3     8       [70 x 3] <gg>   
4     5        [4 x 3] <gg>   
Warning message:
All elements of `...` must be named.
Did you want `data = c(displ, cty, hwy)`? 

Getting the error:

test$ggplots[[1]]
Don't know how to automatically pick scale for object of type tbl_df/tbl/data.frame. Defaulting to continuous.
Error: Aesthetics must be either length 1 or the same as the data (81): x, y

whats wrong?

like image 673
Shery Avatar asked Dec 31 '22 15:12

Shery


1 Answers

One option when we want to loop through a bunch of variables and plot each of them against another variable is to loop through the variable names.

I would first pull out the variable names I want on the y. I use set_names() at the end of the pipe to name the vector with itself, because sometimes I need that for organization later.

vars = mpg %>%
     select_if(is.numeric) %>%
     select(-cyl, - year) %>%
     names() %>%
     set_names()

The result is a vector of strings.

vars
# displ     cty     hwy 
# "displ"   "cty"   "hwy" 

Now I can loop through those variable names and make a plot against the fixed x variable cyl. I'll use a purrr::map() loop for this. Since I'm working with strings I need to use tidy evaluation within ggplot(), done with the .data pronoun (I believe this only works since the latest 0.4.0 release of rlang). I label the y axis with the variable in labs(), otherwise it has the .data pronoun in the axis label.

plots = map(vars, ~ggplot(data = mpg) +
                 geom_point(aes(x = cyl, y = .data[[.x]]) ) +
                 labs(y = .x)
)

I demonstrate the approach above in a blog post I wrote last year if you're interested in more explanation.

If you don't want to loop through strings like this, another option is to reshape the dataset into a long format and then use the nesting approach. The idea is to make a long dataset, taking the variables you want on the y axis and putting their values all together in a single column. I do this with tidyr::pivot_longer(). The numeric values for the y variables are now in a single column, named value.

Then nest the cyl and value columns for each variable name. Once that is done you'll have a three row dataset, one row per y variable, and you can loop through the datasets in mutate() to create your column of plots as in your original attempt.

plots2 = mpg %>%
     select_if(is.numeric) %>% 
     dplyr::select(-year) %>% 
     pivot_longer(cols = -cyl) %>% 
     nest(data = -name) %>%
     mutate(ggplots = map(data, 
                          ~ggplot(data = .x) + geom_point(aes(x = cyl, y = value)))
like image 114
aosmith Avatar answered Jan 08 '23 01:01

aosmith