Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using as.formula with a comma

Tags:

r

dplyr

tidyeval

I'd like to get conditions dynamically from the user, so I built a shiny app that gets them from an input field. Problem is that as.formula doesn't work for a character vector with a comma (without it works fine).

Code:

all_conditions = 
  "condition1 ~ 0,
   condition2 ~ 1,
   condition3 ~ 2"

 my_dataset %>% group_by(id) %>%
  summarise(FLAG = case_when(
      as.formula(all_conditions) )
   )

I get:

Evaluation error: :2:100: unexpected ','

I have tried using paste and escaping the comma with no success.

like image 426
InterruptedException Avatar asked Aug 09 '18 08:08

InterruptedException


People also ask

What does a comma do in a formula?

Separate values in different columns by using commas (,). For example, to represent the values 10, 20, 30, and 40, you enter {10,20,30,40}.

How do you put a comma in a formula in Excel?

Using CONCATENATE function You can use the "concatenate" function to add comma in a cell.

What does comma means in Excel formula?

If you want. to add up specific cells, but don't want to use a range, separate the cell. references with commas instead. For example, the formula “=sum(A1,C1) will add up the contents of just. cells A1 and C1.


3 Answers

The way you are collecting the inputs is not very practical to work with. Your problem is that you are trying to parse code that looks like this:

var1, var2, var3

Try typing that in your R console, you'll get the same error:

#> Error: unexpected ',' in "var1,"

So first of all refactor your code so that you collect inputs as two vectors:

cnds <- c("condition1", "condition2", "condition3")
vals <- c("1", "2", "3")

Now you have two choices to turn these strings to R code: parsing or creating symbols. You use the former when you expect arbitrary R code and the latter when you expect variable or column names. Can you spot the differences?

rlang::parse_exprs(c("foo", "bar()", "100"))
#> [[1]]
#> foo
#>
#> [[2]]
#> bar()
#>
#> [[3]]
#> [1] 100

rlang::syms(c("foo", "bar()", "100"))
#> [[1]]
#> foo
#>
#> [[2]]
#> `bar()`
#>
#> [[3]]
#> `100`

In your case you probably need parsing because the conditions will be R code. So let's start by parsing both vectors:

cnds <- map(cnds, rlang::parse_expr)
vals <- map(vals, rlang::parse_expr)

I'm mapping parse_expr() instead of using the plural version parse_exprs() because the latter can return a list that is longer than its input. For instance parse_exprs(c("foo; bar", "baz; bam")) turns 2 strings to a list of 4 expressions. parse_expr() returns an error if a string contains more than one expression and so is more robust in our case.

Now we can map over these two lists of LHSs and RHSs and create the formulas. One simple way is to use quasiquotation to create the formulas by unquoting each LHS and its corresponding RHS:

fs <- map2(cnds, vals, function(c, v) rlang::expr(!!c ~ !!v))

The result is a list of formula expressions that is ready to be spliced into case_when():

data %>% mutate(result = case_when(!!!fs))

Use rlang::qq_show() to see exactly what the splice-unquoting is doing:

rlang::qq_show(mutate(result = case_when(!!!fs)))
#> mutate(result = case_when(condition1 ~ 1, condition2 ~2, condition3 ~ 3))
like image 70
Lionel Henry Avatar answered Oct 31 '22 08:10

Lionel Henry


Borrowing @phiver's example you could do:

conditions <- "gear == 3 ~ 0, gear == 4 ~ 1, TRUE ~ 2"
mtcars %>% group_by(vs) %>% 
  mutate(FLAG = eval(parse(text=sprintf("case_when(%s)",conditions))))
# # A tibble: 32 x 12
# # Groups:   vs [2]
#      mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb  FLAG
#    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#  1  21.0     6 160.0   110  3.90 2.620 16.46     0     1     4     4     1
#  2  21.0     6 160.0   110  3.90 2.875 17.02     0     1     4     4     1
#  3  22.8     4 108.0    93  3.85 2.320 18.61     1     1     4     1     1
#  4  21.4     6 258.0   110  3.08 3.215 19.44     1     0     3     1     0
#  5  18.7     8 360.0   175  3.15 3.440 17.02     0     0     3     2     0
#  6  18.1     6 225.0   105  2.76 3.460 20.22     1     0     3     1     0
#  7  14.3     8 360.0   245  3.21 3.570 15.84     0     0     3     4     0
#  8  24.4     4 146.7    62  3.69 3.190 20.00     1     0     4     2     1
#  9  22.8     4 140.8    95  3.92 3.150 22.90     1     0     4     2     1
# 10  19.2     6 167.6   123  3.92 3.440 18.30     1     0     4     4     1

The idea here is that you cannot evaluate your string alone as it's not proper syntax by itself, so we have to build a proper call around it first (here using sprintf) and then we can evaluate it on the fly (so it's evaluated in the right environment without further tricks needed).

like image 42
Moody_Mudskipper Avatar answered Oct 31 '22 08:10

Moody_Mudskipper


You need to put every condition in a list and use quosures and quasiquotation (!!!) to get it to work. I will use mtcars as an example, following your code example.

library(dplyr)
# create list of quosures
conditions <- list(quo(gear == 3 ~ 0), 
     quo(gear == 4 ~ 1),
     quo(TRUE ~ 2))


mtcars %>% group_by(vs) %>% 
  mutate(FLAG = case_when(!!! conditions)) # quasiquotation using !!! to splice the list
# A tibble: 32 x 12
# Groups:   vs [2]
     mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb  FLAG
   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
 1  21       6  160    110  3.9   2.62  16.5     0     1     4     4     1
 2  21       6  160    110  3.9   2.88  17.0     0     1     4     4     1
 3  22.8     4  108     93  3.85  2.32  18.6     1     1     4     1     1
 4  21.4     6  258    110  3.08  3.22  19.4     1     0     3     1     0
 5  18.7     8  360    175  3.15  3.44  17.0     0     0     3     2     0
 6  18.1     6  225    105  2.76  3.46  20.2     1     0     3     1     0
 7  14.3     8  360    245  3.21  3.57  15.8     0     0     3     4     0
 8  24.4     4  147.    62  3.69  3.19  20       1     0     4     2     1
 9  22.8     4  141.    95  3.92  3.15  22.9     1     0     4     2     1
10  19.2     6  168.   123  3.92  3.44  18.3     1     0     4     4     1
# ... with 22 more rows
like image 28
phiver Avatar answered Oct 31 '22 08:10

phiver