I've been trying for awhile to combine mutate_at
with coalesce
in case in which names of columns are generated dynamically.
In my example there are only five columns, but in the real data there are much more (and not all columns should be included in coalesce
step).
Example DF:
data_example <- data.frame(
aa = c(1, NA, NA),
bb = c(NA, NA, 2),
cc = c(6, 7, 8),
aa_extra = c(2, 2, NA),
bb_extra = c(1, 2, 3)
)
Expected output:
aa bb cc aa_extra bb_extra
1 1 1 6 2 1
2 2 2 7 2 2
3 NA 2 8 NA 3
output as structure
:
structure(list(aa = c(1, 2, NA), bb = c(1, 2, 2), cc = c(6, 7,
8), aa_extra = c(2, 2, NA), bb_extra = c(1, 2, 3)), class = "data.frame", row.names = c(NA,
-3L))
I've tried something like this, but without success ("Only strings can be converted to symbols"). I would like to avoid creation of extra variables, just include everything in mutate_at
expression, since this is a part of longer dplyr "flow".
data_example %>%
dplyr::mutate_at(
gsub("_extra", "", grep("_extra$",
colnames(.),
perl = T,
value = T)),
dplyr::funs(
dplyr::coalesce(., !!! dplyr::sym(paste0(., "_extra")))
)
)
I've tried also this (no error, but values for column bb
are wrong):
data_example %>%
dplyr::mutate_at(
gsub("_extra", "", grep("_extra$",
colnames(.),
perl = T,
value = T)),
dplyr::funs(
dplyr::coalesce(., !!as.name(paste0(names(.), "_extra")))
)
)
How to get the name of processed column and pass it to coalesce
?
Guess it is now possible to achieve the desired outcome using mutate + across
data_example %>%
mutate(across(c(str_subset(names(.), "_extra") %>% str_remove("_extra")) ,
~ coalesce( ., get(str_c(cur_column(), "_extra")) )))
aa bb cc aa_extra bb_extra
1 1 1 6 2 1
2 2 2 7 2 2
3 NA 2 8 NA 3
We can split
the dataset into a list
of data.frames after removing the substring of column names ("_extra"
), then with map
loop through the list
, coalesce
the column and then bind
with the "_extra" columns in the original dataset
library(tidyverse)
data_example %>%
split.default(str_remove(names(.), "_extra")) %>%
map_df(~ coalesce(!!! .x)) %>%
#or use
# map_df(reduce, coalesce) %>%
bind_cols(., select(data_example, ends_with("extra")))
# A tibble: 3 x 5
# aa bb cc aa_extra bb_extra
# <dbl> <dbl> <dbl> <dbl> <dbl>
#1 1 1 6 2 1
#2 2 2 7 2 2
#3 NA 2 8 NA 3
Using data.table
for melt
and dcast
since I can never remember how spread
and gather
work
library(data.table)
library(dplyr)
data_example %>%
mutate(row = row_number()) %>%
melt('row') %>%
group_by(g = sub('_*$', '', variable), row) %>%
mutate(value = reduce(value, coalesce)) %>%
dcast(row ~ variable) %>%
select(-row)
# aa bb cc aa_extra bb_extra
# 1 1 1 6 1 1
# 2 2 2 7 2 2
# 3 NA 2 8 NA 2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With