Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to refer to a data.frame variable in a dplyr pipeline via . programmatically?

Tags:

r

dplyr

ggplot2

library(ggplot2)
library(dplyr)
library(scales)

data <- data.frame(THEME_NAME = c(rep("A", 10), rep("B", 20), rep("C", 15)))

data %>%
  group_by(THEME_NAME) %>%
  summarise(n = n()) %>%
  mutate(freq = n / sum(n)) %>%
  # THE NEXT LINE !!! #
  ggplot(., aes(x = reorder(THEME_NAME, desc(freq)), y = freq)) +
    geom_bar(stat="identity") +
    scale_y_continuous(labels=percent)

How can I refer to THEME_NAME programmatically? I can do .$THEME_NAME, but I'd like to refer to as .[1] or select(., 1) or something to that nature?

The reason for this is I'd like to use this pipeline in a bigger context - such as passing a bunch of factor variables through this pipeline. Something like: vars.to.plot <- sapply(data, is.factor) and then running each element of vars.to.plot through this pipeline.

like image 510
JasonAizkalns Avatar asked Feb 13 '15 18:02

JasonAizkalns


People also ask

Does dplyr work with data frame?

All of the dplyr functions take a data frame (or tibble) as the first argument. Rather than forcing the user to either save intermediate objects or nest functions, dplyr provides the %>% operator from magrittr. x %>% f(y) turns into f(x, y) so the result from one step is then “piped” into the next step.

Which of the following functions in dplyr package can be used to choose variables using their names?

select() and rename(): For choosing variables and using their names as a base for doing so.

What is N () in dplyr?

n() gives the current group size. cur_data() gives the current data for the current group (excluding grouping variables).


2 Answers

So you need to setup a variable to hold the name of the grouping variable because the "group by" variable information isn't preserved in the tbl_df object after the summarize() call apparently. You could do this

varname<-"THEME_NAME"

data %>%
  group_by_(varname) %>%
  summarise(n = n()) %>%
  mutate(freq = n / sum(n)) %>%
  ggplot(eval(bquote(aes(x=reorder(.(as.name(varname)), desc(freq)), y=freq)))) +
    geom_bar(stat="identity") +
    scale_y_continuous(labels=percent)

Here use use bquote() to dynamically build the aes() call. This is only necessary because of the reorder() step you want to do. Otherwise it would be much easier with an aes_string() or something.

If you always wanted to re-order based on the first column (meaning you would never group by more than one variable), you could do

data %>%
  group_by(THEME_NAME) %>%
  summarise(n = n()) %>%
  mutate(freq = n / sum(n)) %>%
  {ggplot(., eval(substitute(aes(x=reorder(X, desc(freq)), y=freq), list(X=as.name(names(.)[1])))))  +
    geom_bar(stat="identity") +
    scale_y_continuous(labels=percent)}

which doesn't require

like image 128
MrFlick Avatar answered Sep 23 '22 18:09

MrFlick


As far as I can tell this must be done in three parts. There are a few limitations I discovered that I would appreciate someone correcting if I am mistaken.

data <- data.frame(THEME_NAME = c(rep("A", 10), rep("B", 20), rep("C", 15)))    
my_var <- names(data)[1]

df <- data %>%
  group_by_(my_var) %>%
  summarise(n = n()) %>%
  mutate(freq = n / sum(n)) %>%
  arrange(desc(freq))

df[[1]] <- factor(df[[1]], levels = unique(df[[1]]))

ggplot(df, aes_string(x = my_var, y = "freq")) +
  geom_bar(stat="identity") +
  scale_y_continuous(labels=percent)

Trying to have it all one call I ran in to these problems:

  1. There is no way to prevent ggplot from ordering the x-axis automatically without resetting the levels of you variable prior to the call. The only way within the ggplot call is with reorder which cannot, to my knowledge, be used with aes_string.
  2. Another idea I had was to use mutate to reset the levels. One would need to use the s_mutate function from dplyrExras to use strings but resetting levels from the piped dataset doesn't appear to work strings.

The statement would look with mutate like this (which works BTW):

mutate(THEME_NAME = factor(THEME_NAME, levels=unique(THEME_NAME)))

but with the string accepting version the levels remain the same:

s_mutate(my_var = factor(my_var, levels = unique(my_var)))
like image 37
cdeterman Avatar answered Sep 25 '22 18:09

cdeterman