Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to pass dynamic column names in dplyr into custom function?

I have a dataset with the following structure:

Classes ‘tbl_df’ and 'data.frame':  10 obs. of  7 variables:  $ GdeName  : chr  "Aeugst am Albis" "Aeugst am Albis" "Aeugst am Albis" "Aeugst am Albis" ...  $ Partei   : chr  "BDP" "CSP" "CVP" "EDU" ...  $ Stand1971: num  NA NA 4.91 NA 3.21 ...  $ Stand1975: num  NA NA 5.389 0.438 4.536 ...  $ Stand1979: num  NA NA 6.2774 0.0195 3.4355 ...  $ Stand1983: num  NA NA 4.66 1.41 3.76 ...  $ Stand1987: num  NA NA 3.48 1.65 5.75 ... 

I want to provide a function which allows to compute the difference between any value, and I would like to do this using dplyrs mutate function like so: (assume the parameters from and to are passed as arguments)

from <- "Stand1971" to <- "Stand1987"  data %>%   mutate(diff = from - to) 

Of course, this doesn't work, as dplyr uses non-standard evaluation. And I know there's now an elegant solution to the problem using mutate_, and I've read this vignette, but I still can't get my head around it.

What to do?

Here's the first few rows of the dataset for a reproducible example

structure(list(GdeName = c("Aeugst am Albis", "Aeugst am Albis",  "Aeugst am Albis", "Aeugst am Albis", "Aeugst am Albis", "Aeugst am Albis",  "Aeugst am Albis", "Aeugst am Albis", "Aeugst am Albis", "Aeugst am Albis" ), Partei = c("BDP", "CSP", "CVP", "EDU", "EVP", "FDP", "FGA",  "FPS", "GLP", "GPS"), Stand1971 = c(NA, NA, 4.907306434, NA,  3.2109535926, 18.272143463, NA, NA, NA, NA), Stand1975 = c(NA,  NA, 5.389079711, 0.4382328556, 4.5363022622, 18.749259742, NA,  NA, NA, NA), Stand1979 = c(NA, NA, 6.2773722628, 0.0194647202,  3.4355231144, 25.294403893, NA, NA, NA, 2.7055961071), Stand1983 = c(NA,  NA, 4.6609804428, 1.412940467, 3.7563539244, 26.277246489, 0.8529335746,  NA, NA, 2.601878177), Stand1987 = c(NA, NA, 3.4767860929, 1.6535933856,  5.7451770193, 22.146844746, NA, 3.7453183521, NA, 13.702211858 )), .Names = c("GdeName", "Partei", "Stand1971", "Stand1975",  "Stand1979", "Stand1983", "Stand1987"), class = c("tbl_df", "data.frame" ), row.names = c(NA, -10L)) 
like image 611
grssnbchr Avatar asked Apr 16 '15 14:04

grssnbchr


2 Answers

Using the latest version of dplyr (>=0.7), you can use the rlang !! (bang-bang) operator.

library(tidyverse) from <- "Stand1971" to <- "Stand1987"  data %>%   mutate(diff=(!!as.name(from))-(!!as.name(to))) 

You just need to convert the strings to names with as.name and then insert them into the expression. Unfortunately I seem to have to use a few more parenthesis than I would like, but the !! operator seems to fall in a weird order-of-operations order.

Original answer, dplyr (0.3-<0.7):

From that vignette (vignette("nse","dplyr")), use lazyeval's interp() function

library(lazyeval)  from <- "Stand1971" to <- "Stand1987"  data %>%   mutate_(diff=interp(~from - to, from=as.name(from), to=as.name(to))) 
like image 93
MrFlick Avatar answered Sep 20 '22 07:09

MrFlick


You can use .data inside dplyr chain now.

library(dplyr) from <- "Stand1971" to <- "Stand1987"  data %>% mutate(diff = .data[[from]] - .data[[to]]) 

Another option is to use sym with bang-bang (!!)

data %>% mutate(diff = !!sym(from) - !!sym(to)) 

In base R, we can use :

data$diff <- data[[from]] - data[[to]] 
like image 32
Ronak Shah Avatar answered Sep 22 '22 07:09

Ronak Shah