I have a data.frame
with a large number of columns whose names follow a pattern. Such as:
df <- data.frame(
x_1 = c(1, NA, 3),
x_2 = c(1, 2, 4),
y_1 = c(NA, 2, 1),
y_2 = c(5, 6, 7)
)
I would like to apply mutate_at
to perform the same operation on each pair of columns. As in:
df %>%
mutate(
x = ifelse(is.na(x_1), x_2, x_1),
y = ifelse(is.na(y_1), y_2, y_1)
)
Is there a way I can do that with mutate_at
/mutate_each
?
This:
df %>%
mutate_each(vars(x_1, y_1), funs(ifelse(is.na(.), vars(x_2, y_2), .)))
and various variations I've tried all fail.
The question is similar to Using functions of multiple columns in a dplyr mutate_at call, but different in that the second argument to the function call is not a single column, but a different column for each column in vars.
Thanks in advance.
I don't know if you can get it that way, but here's a different perspective on the problem. If you find yourself with really wide data (e.g., tons of columns with similar names) and you want to do something with them, it might help to tidy
the data (long in stata
terms) with tidyr::gather
(see docs here http://tidyr.tidyverse.org/).
> df %>% gather()
key value
1 x_1 1
2 x_1 NA
3 x_1 3
4 x_2 1
5 x_2 2
6 x_2 4
7 y_1 NA
8 y_1 2
9 y_1 1
10 y_2 5
11 y_2 6
12 y_2 7
After converting the data to this format, it's easier to combine and rearrange values using group_by
instead of trying to mutate_at
things. E.g., you can ge the first values with df %>% gather() %>% mutate(var = substr(key,1,1))
and manipulate the x
s and y
s differently using group_by(var)
.
Old question, but I agree with Jesse that you need to tidy your data a bit. gather
would be the way to go, but it lacks somehow the possibility of stats::reshape
where you can specify groups of columns to gather. So here's a solution with reshape
:
df %>%
reshape(varying = list(c("x_1", "y_1"), c("x_2", "y_2")),
times = c("x", "y"),
direction = "long") %>%
mutate(x = ifelse(is.na(x_1), x_2, x_1)) %>%
reshape(idvar = "id",
timevar = "time",
direction = "wide") %>%
rename_all(funs(gsub("[a-zA-Z]+(_*)([0-9]*)\\.([a-zA-Z]+)", "\\3\\1\\2", .)))
# id x_1 x_2 x y_1 y_2 y
# 1 1 1 1 1 NA 5 5
# 2 2 NA 2 2 2 6 2
# 3 3 3 4 3 1 7 1
In order to do that with any number of column pairs, you could do something like:
df2 <- setNames(cbind(df, df), c(t(outer(letters[23:26], 1:2, paste, sep = "_"))))
v <- split(names(df2), purrr::map_chr(names(df2), ~ gsub(".*_(.*)", "\\1", .)))
n <- unique(purrr::map_chr(names(df2), ~ gsub("_[0-9]+", "", .) ))
df2 %>%
reshape(varying = v,
times = n,
direction = "long") %>%
mutate(x = ifelse(is.na(!!sym(v[[1]][1])), !!sym(v[[2]][1]), !!sym(v[[1]][1]))) %>%
reshape(idvar = "id",
timevar = "time",
direction = "wide") %>%
rename_all(funs(gsub("[a-zA-Z]+(_*)([0-9]*)\\.([a-zA-Z]+)", "\\3\\1\\2", .)))
# id w_1 w_2 w x_1 x_2 x y_1 y_2 y z_1 z_2 z
# 1 1 1 1 1 NA 5 5 1 1 1 NA 5 5
# 2 2 NA 2 2 2 6 2 NA 2 2 2 6 2
# 3 3 3 4 3 1 7 1 3 4 3 1 7 1
This assumes that columns which should be compared are next to each other and that all columns for with possible NA values are in columns suffixed by _1
and the replacement value columns are sufficed by _2
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With