I want to create ggplots
for numeric cols
against my response variable
.
Here is the reproducible code:
test = mpg %>% select_if(is.numeric) %>%
dplyr::select(-year) %>% nest(-cyl) %>%
mutate(ggplots = map(data,~ggplot(data = .x) + geom_point(aes(x = cyl, y = .x))))
test
# A tibble: 4 x 3
cyl data ggplots
<int> <list<df[,3]>> <list>
1 4 [81 x 3] <gg>
2 6 [79 x 3] <gg>
3 8 [70 x 3] <gg>
4 5 [4 x 3] <gg>
Warning message:
All elements of `...` must be named.
Did you want `data = c(displ, cty, hwy)`?
Getting the error:
test$ggplots[[1]]
Don't know how to automatically pick scale for object of type tbl_df/tbl/data.frame. Defaulting to continuous.
Error: Aesthetics must be either length 1 or the same as the data (81): x, y
whats wrong?
One option when we want to loop through a bunch of variables and plot each of them against another variable is to loop through the variable names.
I would first pull out the variable names I want on the y
. I use set_names()
at the end of the pipe to name the vector with itself, because sometimes I need that for organization later.
vars = mpg %>%
select_if(is.numeric) %>%
select(-cyl, - year) %>%
names() %>%
set_names()
The result is a vector of strings.
vars
# displ cty hwy
# "displ" "cty" "hwy"
Now I can loop through those variable names and make a plot against the fixed x
variable cyl
. I'll use a purrr::map()
loop for this. Since I'm working with strings I need to use tidy evaluation within ggplot()
, done with the .data
pronoun (I believe this only works since the latest 0.4.0 release of rlang). I label the y axis with the variable in labs()
, otherwise it has the .data
pronoun in the axis label.
plots = map(vars, ~ggplot(data = mpg) +
geom_point(aes(x = cyl, y = .data[[.x]]) ) +
labs(y = .x)
)
I demonstrate the approach above in a blog post I wrote last year if you're interested in more explanation.
If you don't want to loop through strings like this, another option is to reshape the dataset into a long format and then use the nesting approach. The idea is to make a long dataset, taking the variables you want on the y axis and putting their values all together in a single column. I do this with tidyr::pivot_longer()
. The numeric values for the y
variables are now in a single column, named value
.
Then nest the cyl
and value
columns for each variable name. Once that is done you'll have a three row dataset, one row per y
variable, and you can loop through the datasets in mutate()
to create your column of plots as in your original attempt.
plots2 = mpg %>%
select_if(is.numeric) %>%
dplyr::select(-year) %>%
pivot_longer(cols = -cyl) %>%
nest(data = -name) %>%
mutate(ggplots = map(data,
~ggplot(data = .x) + geom_point(aes(x = cyl, y = value)))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With