Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a function like switch which works inside of dplyr::mutate?

Tags:

r

dplyr

I can't use switch inside of mutate because it returns the whole vector instead of just the row. As a hack, I'm using:

pick <- function(x, v1, v2, v3, v4) {     ifelse(x == 1, v1,            ifelse(x == 2, v2,                   ifelse(x == 3, v3,                          ifelse(x == 4, v4, NA)))) } 

This works inside of mutate, and is fine for now because I'm typically choosing among 4 things, but that may change. Can you recommend an alternative?

For example:

library(dplyr) df.faithful <- tbl_df(faithful) df.faithful$x  <- sample(1:4, 272, rep=TRUE) df.faithful$y1 <- rnorm(n=272, mean=7, sd=2) df.faithful$y2 <- rnorm(n=272, mean=5, sd=2) df.faithful$y3 <- rnorm(n=272, mean=7, sd=1) df.faithful$y4 <- rnorm(n=272, mean=5, sd=1) 

Using pick:

mutate(df.faithful, y = pick(x, y1, y2, y3, y4)) Source: local data frame [272 x 8]     eruptions waiting x        y1        y2       y3       y4        y 1      3.600      79 1  8.439092 5.7753006 8.319372 5.078558 8.439092 2      1.800      54 2 13.515956 6.1971512 6.343157 4.962349 6.197151 3      3.333      74 4  7.693941 6.8973365 5.406684 5.425404 5.425404 4      2.283      62 4 12.595852 6.9953995 7.864423 3.730967 3.730967 5      4.533      85 3 11.952922 5.1512987 9.177687 5.511899 9.177687 6      2.883      55 3  7.881350 1.0289711 6.304004 3.554056 6.304004 7      4.700      88 4  8.636709 6.3046198 6.788619 5.748269 5.748269 8      3.600      85 1  8.027371 6.3535056 7.152698 7.034976 8.027371 9      1.950      51 1  5.863370 0.1707758 5.750440 5.058107 5.863370 10     4.350      85 1  7.761653 6.2176610 8.348378 1.861112 7.761653 ..       ...     ... .       ...       ...      ...      ...      ... 

We see that I copy the value from y1 into y if x == 1, and so on. This is what I'm looking to do, but want to be able to do it, whether I have a list of 4 or 400 columns.

Trying to use switch:

mutate(df.faithful, y = switch(x, y1, y2, y3, 4))  Error in switch(c(1L, 2L, 4L, 4L, 3L, 3L, 4L, 1L, 1L, 1L, 4L, 3L, 1L,  :  EXPR must be a length 1 vector 

Trying to use list:

mutate(df.faithful, y = list(y1, y2, y3, y4)[[x]]) Error in list(c(8.43909205142925, 13.5159559591257, 7.69394050059568,  :  recursive indexing failed at level 2 

Trying to use c:

mutate(df.faithful, y = c(y1, y2, y3, y4)[x]) Source: local data frame [272 x 8]     eruptions waiting x        y1        y2       y3       y4         y 1      3.600      79 1  8.439092 5.7753006 8.319372 5.078558  8.439092 2      1.800      54 2 13.515956 6.1971512 6.343157 4.962349 13.515956 3      3.333      74 4  7.693941 6.8973365 5.406684 5.425404 12.595852 4      2.283      62 4 12.595852 6.9953995 7.864423 3.730967 12.595852 5      4.533      85 3 11.952922 5.1512987 9.177687 5.511899  7.693941 6      2.883      55 3  7.881350 1.0289711 6.304004 3.554056  7.693941 7      4.700      88 4  8.636709 6.3046198 6.788619 5.748269 12.595852 8      3.600      85 1  8.027371 6.3535056 7.152698 7.034976  8.439092 9      1.950      51 1  5.863370 0.1707758 5.750440 5.058107  8.439092 10     4.350      85 1  7.761653 6.2176610 8.348378 1.861112  8.439092 ..       ...     ... .       ...       ...      ...      ...       ... 

No errors are produced, but the behavior is not as intended.

like image 301
wdkrnls Avatar asked Feb 18 '15 20:02

wdkrnls


People also ask

What does mutate in dplyr do?

mutate() adds new variables and preserves existing ones; transmute() adds new variables and drops existing ones. New variables overwrite existing variables of the same name. Variables can be removed by setting their value to NULL .

Which function of dplyr package helps in adding modifying a column of a data frame?

Add a column to a dataframe in R using dplyr. In my opinion, the best way to add a column to a dataframe in R is with the mutate() function from dplyr .

What is Case_when in R?

case_when.Rd. This function allows you to vectorise multiple if_else() statements. It is an R equivalent of the SQL CASE WHEN statement.

What is mutate command in R?

mutate() is a dplyr function that adds new variables and preserves existing ones. That's what the documentation says. So when you want to add new variables or change one already in the dataset, that's your good ally.


1 Answers

Eons too late for the OP, but in case this shows up in a search ...

dplyr v0.5 has recode(), a vectorized version of switch(), so

data_frame(   x = sample(1:4, 10, replace=TRUE),   y1 = rnorm(n=10, mean=7, sd=2),   y2 = rnorm(n=10, mean=5, sd=2),   y3 = rnorm(n=10, mean=7, sd=1),   y4 = rnorm(n=10, mean=5, sd=1) ) %>% mutate(y = recode(x,y1,y2,y3,y4)) 

produces, as anticipated:

# A tibble: 10 x 6        x        y1       y2       y3       y4        y    <int>     <dbl>    <dbl>    <dbl>    <dbl>    <dbl> 1      2  6.950106 6.986780 7.826778 6.317968 6.986780 2      1  5.776381 7.706869 7.982543 5.048649 5.776381 3      2  7.315477 2.213855 6.079149 6.070598 2.213855 4      3  7.461220 5.100436 7.085912 4.440829 7.085912 5      3  5.780493 4.562824 8.311047 5.612913 8.311047 6      3  5.373197 7.657016 7.049352 4.470906 7.049352 7      2  6.604175 9.905151 8.359549 6.430572 9.905151 8      3 11.363914 4.721148 7.670825 5.317243 7.670825 9      3 10.123626 7.140874 6.718351 5.508875 6.718351 10     4  5.407502 4.650987 5.845482 4.797659 4.797659 

(Also works with named args, including character and factor x's.)

like image 86
user6702291 Avatar answered Sep 17 '22 18:09

user6702291