Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Functional programming with dplyr

Looking for a more efficient / elegant way to pass multiple arguments to a group-by using non-standard evaluation in a function using dplyr. I don't want to use the ... operator, but to specify the functions individually.

My specific use case is a function which takes a data frame and creates a ggplot object with simpler syntax. Here is an example of the code I want to automate with my function:

# create data frame
my_df <- data.frame(month = sample(1:12, 1000, replace = T),
                    category = sample(head(letters, 3), 1000, replace = T),
                    approved = as.numeric(runif(1000) < 0.5))

my_df$converted <- my_df$approved * as.numeric(runif(1000) < 0.5)

my_df %>%
  group_by(month, category) %>%
  summarize(conversion_rate = sum(converted) / sum(approved)) %>%
  ggplot + geom_line(aes(x = month, y = conversion_rate, group = category, 
  color = category))

I want to combine that group_by, summarize, ggplot, and geom_line into a simple function that I can feed an x, y, and group, and have it perform all the dirty work under the hood. Here's what I've gotten to work:

# create the function that does the grouping and plotting
plot_lines <- function(df, x, y, group) {

  x <- enquo(x)
  group <- enquo(group)
  group_bys <- quos(!! x, !! group)

  df %>%
    group_by(!!! group_bys) %>%
    my_smry %>%
    ggplot + geom_line(aes_(x = substitute(x), y = substitute(y), 
    group = substitute(group), color = substitute(group)))
}

# create a function to do the summarization
my_smry <- function(x) {
  x %>% 
    summarize(conversion_rate = sum(converted) / sum(approved))
}

# use my function
my_df %>% 
  plot_lines(x = month, y = conversion_rate, group = category)

I feel like the group_by handling is pretty inelegant: quoting x and group with enquo, then unquoting them with !! inside of another quoting function quos, only to re-unquote them with !!! on the next line, but it's the only thing I've been able to get to work. Is there a better way to do this?

Also, is there a way to get ggplot to take !! instead of substitute? What I'm doing feels inconsistent.

like image 946
Aaron Cooley Avatar asked Nov 21 '17 00:11

Aaron Cooley


1 Answers

You could just do a straight eval.parent(substitute(...)) like this. Being base R it works consistently across R and is simple to do. One can even use an ordinary aes.

plot_lines <- function(df, x, y, group) eval.parent(substitute(
   df %>%
      group_by(x, group) %>%
      my_smry %>%
      ggplot + geom_line(aes(x = x, y = y, group = group, color = group))
))
plot_lines(my_df, month, conversion_rate, category)
like image 143
G. Grothendieck Avatar answered Oct 23 '22 11:10

G. Grothendieck