Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Summarizing by dynamic column name in dplyr

Tags:

r

dplyr

So I'm trying to do some programming in dplyr and I am having some trouble with the enquo and !! evaluations.

Basically I would like to mutate a column to a dynamic column name, and then be able to further manipulate that column (i.e. summarize). For instance:

my_function <- function(data, column) {

  quo_column <- enquo(column)

  new_col <- paste0(quo_column, "_adjusted")[2]

  data %>%
     mutate(!!new_col := (!!quo_column) + 1) 
  }

my_function(iris, Petal.Length)

This works great and returns a column called "Petal.Length.adjusted" which is just Petal.Length increased by one.

However I can't seem to summarize this new column.

my_function <- function(data, column) {

  quo_column <- enquo(column)

   new_col <- paste0(quo_column, "_adjusted")[2]

   mean_col <- paste0(quo_column, "_meanAdjusted")[2]

   data %>%
      mutate(!!new_col := (!!quo_column) + 1) %>%
      group_by(Species) %>%
      summarize(!!mean_col := mean(!!new_col))
}

my_function(iris, Petal.Length)

This results in a warning stating the argument "Petal.Length_adjusted" is not numeric or logical, although the output from the mutate call gives a numeric column.

How do I reference this dynamically generated column name to pass it in further dplyr functions?

like image 923
John Harley Avatar asked May 25 '18 21:05

John Harley


People also ask

How do I change the column name in Summarise in R?

To rename a column in R, you can use the rename() function from dplyr. For example, if you want to rename the column “A” to “B” again, you can run the following code: rename(dataframe, B = A) .

How do I summarize a column in R?

The summarise_all method in R is used to affect every column of the data frame. The output data frame returns all the columns of the data frame where the specified function is applied over every column. Arguments : data – The data frame to summarise the columns of.

How do I summarize multiple columns from a group in R?

To perform summarise on multiple columns, create a vector with the column names and use it with across() function. This example does the group by on department and state columns, summarises on salary & bonus columns, and apply the sum function on each summarised column.


1 Answers

Unlike the quo_column which is a quosure, the new_col and mean_col are strings, so we convert it to symbol using sym (from rlang) and then do the evaluation

my_function <- function(data, column) {

   quo_column <- enquo(column)

   new_col <- paste0(quo_column, "_adjusted")[2]       

   mean_col <- paste0(quo_column, "_meanAdjusted")[2]

   data %>%
      mutate(!!new_col := (!!quo_column) + 1)  %>%
      group_by(Species) %>%
      summarise(!!mean_col := mean(!! rlang::sym(new_col)))
}

head(my_function(iris, Petal.Length))
# A tibble: 3 x 2
#  Species    Petal.Length_meanAdjusted
#  <fct>                          <dbl>
#1 setosa                          2.46
#2 versicolor                      5.26
#3 virginica                       6.55
like image 127
akrun Avatar answered Sep 19 '22 07:09

akrun