<pre class="prettyprint"><code>library(ggplot2) library(dplyr) library(scales) data <- data.frame(THEME_NAME = c(rep("A", 10), rep("B", 20), rep("C", 15))) data %>% group_by(THEME_NAME) %>% summarise(n = n()) %>% mutate(freq = n / sum(n)) %>% # THE NEXT LINE !!! # ggplot(., aes(x = reorder(THEME_NAME, desc(freq)), y = freq)) + geom_bar(stat="identity") + scale_y_continuous(labels=percent) </code></pre> How can I refer to <code>THEME_NAME</code> programmatically? I can do <code>.$THEME_NAME</code>, but I'd like to refer to as <code>.[1]</code> or <code>select(., 1)</code> or something to that nature? The reason for this is I'd like to use this pipeline in a bigger context - such as passing a bunch of factor variables through this pipeline. Something like: <code>vars.to.plot <- sapply(data, is.factor)</code> and then running each element of <code>vars.to.plot</code> through this pipeline.

So you need to setup a variable to hold the name of the grouping variable because the "group by" variable information isn't preserved in the <code>tbl_df</code> object after the <code>summarize()</code> call apparently. You could do this <pre class="prettyprint"><code>varname<-"THEME_NAME" data %>% group_by_(varname) %>% summarise(n = n()) %>% mutate(freq = n / sum(n)) %>% ggplot(eval(bquote(aes(x=reorder(.(as.name(varname)), desc(freq)), y=freq)))) + geom_bar(stat="identity") + scale_y_continuous(labels=percent) </code></pre> Here use use <code>bquote()</code> to dynamically build the <code>aes()</code> call. This is only necessary because of the <code>reorder()</code> step you want to do. Otherwise it would be much easier with an <code>aes_string()</code> or something. If you always wanted to re-order based on the first column (meaning you would never group by more than one variable), you could do <pre class="prettyprint"><code>data %>% group_by(THEME_NAME) %>% summarise(n = n()) %>% mutate(freq = n / sum(n)) %>% {ggplot(., eval(substitute(aes(x=reorder(X, desc(freq)), y=freq), list(X=as.name(names(.)[1]))))) + geom_bar(stat="identity") + scale_y_continuous(labels=percent)} </code></pre> which doesn't require

As far as I can tell this must be done in three parts. There are a few limitations I discovered that I would appreciate someone correcting if I am mistaken. <pre class="prettyprint"><code>data <- data.frame(THEME_NAME = c(rep("A", 10), rep("B", 20), rep("C", 15))) my_var <- names(data)[1] df <- data %>% group_by_(my_var) %>% summarise(n = n()) %>% mutate(freq = n / sum(n)) %>% arrange(desc(freq)) df[[1]] <- factor(df[[1]], levels = unique(df[[1]])) ggplot(df, aes_string(x = my_var, y = "freq")) + geom_bar(stat="identity") + scale_y_continuous(labels=percent) </code></pre> Trying to have it all one call I ran in to these problems: <ol> <li>There is no way to prevent <code>ggplot</code> from ordering the x-axis automatically without resetting the levels of you variable prior to the call. The only way within the <code>ggplot</code> call is with <code>reorder</code> which cannot, to my knowledge, be used with <code>aes_string</code>.</li> <li>Another idea I had was to use <code>mutate</code> to reset the levels. One would need to use the <code>s_mutate</code> function from <code>dplyrExras</code> to use strings but resetting levels from the piped dataset doesn't appear to work strings.</li> </ol> The statement would look with <code>mutate</code> like this (which works BTW): <pre class="prettyprint"><code>mutate(THEME_NAME = factor(THEME_NAME, levels=unique(THEME_NAME))) </code></pre> but with the string accepting version the levels remain the same: <pre class="prettyprint"><code>s_mutate(my_var = factor(my_var, levels = unique(my_var))) </code></pre>

How to refer to a data.frame variable in a dplyr pipeline via . programmatically?

Q: Does dplyr work with data frame?

All of the dplyr functions take a data frame (or tibble) as the first argument. Rather than forcing the user to either save intermediate objects or nest functions, dplyr provides the %>% operator from magrittr. x %>% f(y) turns into f(x, y) so the result from one step is then “piped” into the next step.

Tags:

r

dplyr

ggplot2

library(ggplot2)
library(dplyr)
library(scales)

data <- data.frame(THEME_NAME = c(rep("A", 10), rep("B", 20), rep("C", 15)))

data %>%
  group_by(THEME_NAME) %>%
  summarise(n = n()) %>%
  mutate(freq = n / sum(n)) %>%
  # THE NEXT LINE !!! #
  ggplot(., aes(x = reorder(THEME_NAME, desc(freq)), y = freq)) +
    geom_bar(stat="identity") +
    scale_y_continuous(labels=percent)

How can I refer to THEME_NAME programmatically? I can do .$THEME_NAME, but I'd like to refer to as .[1] or select(., 1) or something to that nature?

The reason for this is I'd like to use this pipeline in a bigger context - such as passing a bunch of factor variables through this pipeline. Something like: vars.to.plot <- sapply(data, is.factor) and then running each element of vars.to.plot through this pipeline.

510

asked Feb 13 '15 18:02

JasonAizkalns

2 Answers

So you need to setup a variable to hold the name of the grouping variable because the "group by" variable information isn't preserved in the tbl_df object after the summarize() call apparently. You could do this

varname<-"THEME_NAME"

data %>%
  group_by_(varname) %>%
  summarise(n = n()) %>%
  mutate(freq = n / sum(n)) %>%
  ggplot(eval(bquote(aes(x=reorder(.(as.name(varname)), desc(freq)), y=freq)))) +
    geom_bar(stat="identity") +
    scale_y_continuous(labels=percent)

Here use use bquote() to dynamically build the aes() call. This is only necessary because of the reorder() step you want to do. Otherwise it would be much easier with an aes_string() or something.

If you always wanted to re-order based on the first column (meaning you would never group by more than one variable), you could do

data %>%
  group_by(THEME_NAME) %>%
  summarise(n = n()) %>%
  mutate(freq = n / sum(n)) %>%
  {ggplot(., eval(substitute(aes(x=reorder(X, desc(freq)), y=freq), list(X=as.name(names(.)[1])))))  +
    geom_bar(stat="identity") +
    scale_y_continuous(labels=percent)}

which doesn't require

128

answered Sep 23 '22 18:09

MrFlick

As far as I can tell this must be done in three parts. There are a few limitations I discovered that I would appreciate someone correcting if I am mistaken.

data <- data.frame(THEME_NAME = c(rep("A", 10), rep("B", 20), rep("C", 15)))    
my_var <- names(data)[1]

df <- data %>%
  group_by_(my_var) %>%
  summarise(n = n()) %>%
  mutate(freq = n / sum(n)) %>%
  arrange(desc(freq))

df[[1]] <- factor(df[[1]], levels = unique(df[[1]]))

ggplot(df, aes_string(x = my_var, y = "freq")) +
  geom_bar(stat="identity") +
  scale_y_continuous(labels=percent)

Trying to have it all one call I ran in to these problems:

There is no way to prevent ggplot from ordering the x-axis automatically without resetting the levels of you variable prior to the call. The only way within the ggplot call is with reorder which cannot, to my knowledge, be used with aes_string.
Another idea I had was to use mutate to reset the levels. One would need to use the s_mutate function from dplyrExras to use strings but resetting levels from the piped dataset doesn't appear to work strings.

The statement would look with mutate like this (which works BTW):

mutate(THEME_NAME = factor(THEME_NAME, levels=unique(THEME_NAME)))

but with the string accepting version the levels remain the same:

s_mutate(my_var = factor(my_var, levels = unique(my_var)))

answered Sep 25 '22 18:09

cdeterman

Related questions
                            
                                How to save boxplot to as to a variable?
                            
                                How to execute sql query files via RPostgreSQL
                            
                                Short caption fig.scap in knitr not working?
                            
                                R, issue with a Hierarchical clustering after a Multiple correspondence analysis
                            
                                Casting unique features in column to variable names and dummy coding original features into variables in R
                            
                                Difference between R.scale() and sklearn.preprocessing.scale()
                            
                                Synchronization of dygraph in R not working
                            
                                Staggered axis labels in ggplot2
                            
                                Retrieving cached oauth token with packages httr, twitteR, and streamR
                            
                                Appending to elements within an Rcpp List
                            
                                Title not showing on R Markdown with knitr when rendering markdown file
                            
                                Flatten a list with complex nested structure
                            
                                How to get the function result from Symbolic Regression with R
                            
                                Knitting to PDF in R
                            
                                How to plot a Cox hazard model with splines
                            
                                Inverse probability weights in r
                            
                                Why is this regex using lookbehinds invalid in R?
                            
                                R Shiny: Removing ggplot2 background to make it transparent
                            
                                Add NAs to make all list elements equal length
                            
                                Find day of year with the lubridate package in R

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With