Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using dplyr mutate_at when a function takes multiple arguments which are different columns

Tags:

r

dplyr

I have a data.frame with a large number of columns whose names follow a pattern. Such as:

df <- data.frame(
  x_1 = c(1, NA, 3), 
  x_2 = c(1, 2, 4), 
  y_1 = c(NA, 2, 1), 
  y_2 = c(5, 6, 7)
)

I would like to apply mutate_at to perform the same operation on each pair of columns. As in:

df %>%
  mutate(
    x = ifelse(is.na(x_1), x_2, x_1), 
    y = ifelse(is.na(y_1), y_2, y_1)
  )

Is there a way I can do that with mutate_at/mutate_each?

This:

df %>%
  mutate_each(vars(x_1, y_1), funs(ifelse(is.na(.), vars(x_2, y_2), .)))

and various variations I've tried all fail.

The question is similar to Using functions of multiple columns in a dplyr mutate_at call, but different in that the second argument to the function call is not a single column, but a different column for each column in vars.

Thanks in advance.

like image 496
Bob Avatar asked Nov 08 '22 11:11

Bob


2 Answers

I don't know if you can get it that way, but here's a different perspective on the problem. If you find yourself with really wide data (e.g., tons of columns with similar names) and you want to do something with them, it might help to tidy the data (long in stata terms) with tidyr::gather (see docs here http://tidyr.tidyverse.org/).

> df %>% gather()
   key value
1  x_1     1
2  x_1    NA
3  x_1     3
4  x_2     1
5  x_2     2
6  x_2     4
7  y_1    NA
8  y_1     2
9  y_1     1
10 y_2     5
11 y_2     6
12 y_2     7

After converting the data to this format, it's easier to combine and rearrange values using group_by instead of trying to mutate_at things. E.g., you can ge the first values with df %>% gather() %>% mutate(var = substr(key,1,1)) and manipulate the xs and ys differently using group_by(var).

like image 69
twedl Avatar answered Nov 14 '22 22:11

twedl


Old question, but I agree with Jesse that you need to tidy your data a bit. gather would be the way to go, but it lacks somehow the possibility of stats::reshape where you can specify groups of columns to gather. So here's a solution with reshape:

df %>% 
   reshape(varying   = list(c("x_1", "y_1"), c("x_2", "y_2")), 
           times     = c("x", "y"),
           direction = "long") %>% 
   mutate(x = ifelse(is.na(x_1), x_2, x_1)) %>% 
   reshape(idvar     = "id", 
           timevar   = "time",
           direction = "wide") %>% 
   rename_all(funs(gsub("[a-zA-Z]+(_*)([0-9]*)\\.([a-zA-Z]+)", "\\3\\1\\2", .)))
#   id x_1 x_2 x y_1 y_2 y
# 1  1   1   1 1  NA   5 5
# 2  2  NA   2 2   2   6 2
# 3  3   3   4 3   1   7 1

In order to do that with any number of column pairs, you could do something like:

df2 <- setNames(cbind(df, df), c(t(outer(letters[23:26], 1:2, paste, sep = "_"))))
v <- split(names(df2), purrr::map_chr(names(df2), ~ gsub(".*_(.*)", "\\1", .)))
n <- unique(purrr::map_chr(names(df2), ~ gsub("_[0-9]+", "", .) ))
df2 %>% 
    reshape(varying   = v, 
            times     = n,
            direction = "long") %>% 
     mutate(x = ifelse(is.na(!!sym(v[[1]][1])), !!sym(v[[2]][1]), !!sym(v[[1]][1]))) %>% 
     reshape(idvar     = "id", 
             timevar   = "time",
             direction = "wide") %>% 
     rename_all(funs(gsub("[a-zA-Z]+(_*)([0-9]*)\\.([a-zA-Z]+)", "\\3\\1\\2", .)))
#   id w_1 w_2 w x_1 x_2 x y_1 y_2 y z_1 z_2 z
# 1  1   1   1 1  NA   5 5   1   1 1  NA   5 5
# 2  2  NA   2 2   2   6 2  NA   2 2   2   6 2
# 3  3   3   4 3   1   7 1   3   4 3   1   7 1

This assumes that columns which should be compared are next to each other and that all columns for with possible NA values are in columns suffixed by _1 and the replacement value columns are sufficed by _2.

like image 36
thothal Avatar answered Nov 14 '22 21:11

thothal