library(ggplot2)
library(dplyr)
library(scales)
data <- data.frame(THEME_NAME = c(rep("A", 10), rep("B", 20), rep("C", 15)))
data %>%
group_by(THEME_NAME) %>%
summarise(n = n()) %>%
mutate(freq = n / sum(n)) %>%
# THE NEXT LINE !!! #
ggplot(., aes(x = reorder(THEME_NAME, desc(freq)), y = freq)) +
geom_bar(stat="identity") +
scale_y_continuous(labels=percent)
How can I refer to THEME_NAME
programmatically? I can do .$THEME_NAME
, but I'd like to refer to as .[1]
or select(., 1)
or something to that nature?
The reason for this is I'd like to use this pipeline in a bigger context - such as passing a bunch of factor variables through this pipeline. Something like: vars.to.plot <- sapply(data, is.factor)
and then running each element of vars.to.plot
through this pipeline.
All of the dplyr functions take a data frame (or tibble) as the first argument. Rather than forcing the user to either save intermediate objects or nest functions, dplyr provides the %>% operator from magrittr. x %>% f(y) turns into f(x, y) so the result from one step is then “piped” into the next step.
select() and rename(): For choosing variables and using their names as a base for doing so.
n() gives the current group size. cur_data() gives the current data for the current group (excluding grouping variables).
So you need to setup a variable to hold the name of the grouping variable because the "group by" variable information isn't preserved in the tbl_df
object after the summarize()
call apparently. You could do this
varname<-"THEME_NAME"
data %>%
group_by_(varname) %>%
summarise(n = n()) %>%
mutate(freq = n / sum(n)) %>%
ggplot(eval(bquote(aes(x=reorder(.(as.name(varname)), desc(freq)), y=freq)))) +
geom_bar(stat="identity") +
scale_y_continuous(labels=percent)
Here use use bquote()
to dynamically build the aes()
call. This is only necessary because of the reorder()
step you want to do. Otherwise it would be much easier with an aes_string()
or something.
If you always wanted to re-order based on the first column (meaning you would never group by more than one variable), you could do
data %>%
group_by(THEME_NAME) %>%
summarise(n = n()) %>%
mutate(freq = n / sum(n)) %>%
{ggplot(., eval(substitute(aes(x=reorder(X, desc(freq)), y=freq), list(X=as.name(names(.)[1]))))) +
geom_bar(stat="identity") +
scale_y_continuous(labels=percent)}
which doesn't require
As far as I can tell this must be done in three parts. There are a few limitations I discovered that I would appreciate someone correcting if I am mistaken.
data <- data.frame(THEME_NAME = c(rep("A", 10), rep("B", 20), rep("C", 15)))
my_var <- names(data)[1]
df <- data %>%
group_by_(my_var) %>%
summarise(n = n()) %>%
mutate(freq = n / sum(n)) %>%
arrange(desc(freq))
df[[1]] <- factor(df[[1]], levels = unique(df[[1]]))
ggplot(df, aes_string(x = my_var, y = "freq")) +
geom_bar(stat="identity") +
scale_y_continuous(labels=percent)
Trying to have it all one call I ran in to these problems:
ggplot
from ordering the x-axis automatically without resetting the levels of you variable prior to the call. The only way within the ggplot
call is with reorder
which cannot, to my knowledge, be used with aes_string
.mutate
to reset the levels. One would need to use the s_mutate
function from dplyrExras
to use strings but resetting levels from the piped dataset doesn't appear to work strings.The statement would look with mutate
like this (which works BTW):
mutate(THEME_NAME = factor(THEME_NAME, levels=unique(THEME_NAME)))
but with the string accepting version the levels remain the same:
s_mutate(my_var = factor(my_var, levels = unique(my_var)))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With