Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

dplyr mutate new dynamic variables with case_when

Tags:

dynamic

r

dplyr

I'm aware of similar questions here and here, but I haven't been able to figure out the right solution for my specific situation. Some of what I'm finding are solutions which use mutate_, etc but I understand these are now obsolete. I'm new to dynamic usages of dplyr.

I have a dataframe which includes some variables with two different prefixes, alpha and beta:

df <- data.frame(alpha.num = c(1, 3, 5, 7),
             alpha.char = c("a", "c", "e", "g"),
             beta.num = c(2, 4, 6, 8),
             beta.char = c("b", "d", "f", "h"),
             which.to.use = c("alpha", "alpha", "beta", "beta"))

I want to create new variables with the prefix "chosen." which are copies of either the "alpha" or "beta" columns depending on which is named for that row in the "which.to.use" column. The desired output would be:

desired.df <- data.frame(alpha.num = c(1, 3, 5, 7),
                     alpha.char = c("a", "c", "e", "g"),
                     beta.num = c(2, 4, 6, 8),
                     beta.char = c("b", "d", "f", "h"),
                     which.to.use = c("alpha", "alpha", "beta", "beta"),
                     chosen.num = c(1, 3, 6, 8),
                     chosen.char = c("a", "c", "f", "h"))

My failed attempt:

varnames <- c("num", "char")
df %<>%
  mutate(as.name(paste0("chosen.", varnames)) := case_when(
    which.to.use == "alpha" ~ paste0("alpha.", varnames),
    which.to.use == "beta" ~ pasteo("beta.", varnames)
  ))

I'd prefer a pure dplyr solution, and even better would be one which could be included in a longer pipe modifying the df (i.e. no need to stop to create "varnames"). Thanks for your help.

like image 716
lost Avatar asked Aug 02 '18 02:08

lost


3 Answers

Using some fun rlang stuff & purrr:

library(rlang)
library(purrr)
library(dplyr)

df <- data.frame(alpha.num = c(1, 3, 5, 7),
                 alpha.char = c("a", "c", "e", "g"),
                 beta.num = c(2, 4, 6, 8),
                 beta.char = c("b", "d", "f", "h"),
                 which.to.use = c("alpha", "alpha", "beta", "beta"),
                 stringsAsFactors = F)

c("num", "char") %>% 
    map(~ mutate(df, !!sym(paste0("chosen.", .x)) := 
      case_when(
          which.to.use == "alpha" ~ !!sym(paste0("alpha.", .x)),
          which.to.use == "beta" ~ !!sym(paste0("beta.", .x))
                ))) %>% 
    reduce(full_join)

Result:

  alpha.num alpha.char beta.num beta.char which.to.use chosen.num chosen.char
1         1          a        2         b        alpha          1           a
2         3          c        4         d        alpha          3           c
3         5          e        6         f         beta          6           f
4         7          g        8         h         beta          8           h

Without reduce(full_join):

c("num", "char") %>% 
  map_dfc(~ mutate(df, !!sym(paste0("chosen.", .x)) := 
                 case_when(
                   which.to.use == "alpha" ~ !!sym(paste0("alpha.", .x)),
                   which.to.use == "beta" ~ !!sym(paste0("beta.", .x))
                 ))) %>% 
  select(-ends_with("1"))



alpha.num alpha.char beta.num beta.char which.to.use chosen.num chosen.char
1         1          a        2         b        alpha          1           a
2         3          c        4         d        alpha          3           c
3         5          e        6         f         beta          6           f
4         7          g        8         h         beta          8           h

Explanation:
(Note: I do not fully or even kind of get rlang. Maybe others can give a better explanation ;).)

Using paste0 by itself produces a string, when we need a bare name for mutate to know it is referring to a variable name.

If we wrap paste0 in sym, it evaluates to a bare name:

> x <- varrnames[1]
> sym(paste0("alpha.", x))
  alpha.num

But mutate does not know to evaluate and instead read it as a symbol:

> typeof(sym(paste0("alpha.", x)))
[1] "symbol"

The "bang bang" !! operator evaluates the sym function. Compare:

> expr(mutate(df, var = sym(paste0("alpha.", x))))
mutate(df, var = sym(paste0("alpha.", x)))

> expr(mutate(df, var = !!sym(paste0("alpha.", x))))
mutate(df, var = alpha.num)

So with !!sym we can use paste to dynamically called variable names with dplyr.

like image 187
paqmo Avatar answered Oct 10 '22 02:10

paqmo


This is a nest()/map() strategy that should be pretty fast. It stays in the tidyverse, but doesn't go into rlang land.

library(tidyverse)

df %>% 
    nest(-which.to.use) %>%
    mutate(new_data = map2(data, which.to.use,
                       ~ select(..1, matches(..2)) %>%
                           rename_all(funs(gsub(".*\\.", "choosen.", .) )))) %>%
    unnest()

  which.to.use alpha.num alpha.char beta.num beta.char choosen.num choosen.char
1        alpha         1          a        2         b           1            a
2        alpha         3          c        4         d           3            c
3         beta         5          e        6         f           6            f
4         beta         7          g        8         h           8            h

It grabs all columns, not just num and char, that are not which.to.use. But that seems like what you (I) would want IRL. You could add a select(matches('(var1|var2|etc')) line before you call nest() if you wanted to pull only specific variables.

EDIT: My original suggestion of using select() to drop unneeded columns would result in doing a join to bring them back later. If instead you adjust the nest parameters, you can acheive this on only certain columns.

I added new bool columns here, but they will be ignored for the "choosen" selection:

new_df <- data.frame(alpha.num = c(1, 3, 5, 7),
                 alpha.char = c("a", "c", "e", "g"),
                 alpha.bool = FALSE,
                 beta.num = c(2, 4, 6, 8),
                 beta.char = c("b", "d", "f", "h"),
                 beta.bool = TRUE,
                 which.to.use = c("alpha", "alpha", "beta", "beta"),
                 stringsAsFactors = FALSE)

new_df %>% 
    nest(matches("num|char")) %>% # only columns that match this pattern get nested, allows you to save others
    mutate(new_data = map2(data, which.to.use,
                           ~ select(..1, matches(..2)) %>%
                               rename_all(funs(gsub(".*\\.", "choosen.", .) )))) %>%
    unnest()

  alpha.bool beta.bool which.to.use alpha.num alpha.char beta.num beta.char choosen.num choosen.char
1      FALSE      TRUE        alpha         1          a        2         b           1            a
2      FALSE      TRUE        alpha         3          c        4         d           3            c
3      FALSE      TRUE         beta         5          e        6         f           6            f
4      FALSE      TRUE         beta         7          g        8         h           8            h
like image 3
Nate Avatar answered Oct 10 '22 01:10

Nate


A base R approach using apply with margin = 1 where we select columns for each row based on the value in which.to.use column and get the value from corresponding column for the row.

df[c("chosen.num", "chosen.char")] <- 
          t(apply(df, 1, function(x) x[grepl(x["which.to.use"], names(df))]))

df
#  alpha.num alpha.char beta.num beta.char which.to.use chosen.num chosen.char
#1         1          a        2         b        alpha          1           a
#2         3          c        4         d        alpha          3           c
#3         5          e        6         f         beta          6           f
#4         7          g        8         h         beta          8           h
like image 2
Ronak Shah Avatar answered Oct 10 '22 01:10

Ronak Shah