Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

dplyr: mutate_at + coalesce: dynamic names of columns

Tags:

r

dplyr

rlang

I've been trying for awhile to combine mutate_at with coalesce in case in which names of columns are generated dynamically.

In my example there are only five columns, but in the real data there are much more (and not all columns should be included in coalesce step).

Example DF:

data_example <- data.frame(
  aa = c(1, NA, NA),
  bb = c(NA, NA, 2),
  cc = c(6, 7, 8),
  aa_extra = c(2, 2, NA),
  bb_extra = c(1, 2, 3)
)

Expected output:

  aa bb cc aa_extra bb_extra
1  1  1  6        2        1
2  2  2  7        2        2
3 NA  2  8       NA        3

output as structure:

structure(list(aa = c(1, 2, NA), bb = c(1, 2, 2), cc = c(6, 7, 
8), aa_extra = c(2, 2, NA), bb_extra = c(1, 2, 3)), class = "data.frame", row.names = c(NA, 
-3L))

I've tried something like this, but without success ("Only strings can be converted to symbols"). I would like to avoid creation of extra variables, just include everything in mutate_at expression, since this is a part of longer dplyr "flow".

data_example %>%
  dplyr::mutate_at(
    gsub("_extra", "", grep("_extra$",
                            colnames(.),
                            perl = T,
                            value = T)),
    dplyr::funs(
      dplyr::coalesce(., !!! dplyr::sym(paste0(., "_extra")))
    )
  )

I've tried also this (no error, but values for column bb are wrong):

data_example %>%
  dplyr::mutate_at(
    gsub("_extra", "", grep("_extra$",
                            colnames(.),
                            perl = T,
                            value = T)),
    dplyr::funs(
      dplyr::coalesce(., !!as.name(paste0(names(.), "_extra")))
    )
  )

How to get the name of processed column and pass it to coalesce?

like image 775
RSzT Avatar asked Mar 03 '19 17:03

RSzT


3 Answers

Guess it is now possible to achieve the desired outcome using mutate + across

data_example %>% 
  mutate(across(c(str_subset(names(.), "_extra") %>% str_remove("_extra")) ,
                ~ coalesce( ., get(str_c(cur_column(), "_extra"))  ))) 

  aa bb cc aa_extra bb_extra
1  1  1  6        2        1
2  2  2  7        2        2
3 NA  2  8       NA        3
like image 150
Yonghao Avatar answered Sep 28 '22 01:09

Yonghao


We can split the dataset into a list of data.frames after removing the substring of column names ("_extra"), then with map loop through the list, coalesce the column and then bindwith the "_extra" columns in the original dataset

library(tidyverse)
data_example %>% 
   split.default(str_remove(names(.), "_extra")) %>%
   map_df(~ coalesce(!!! .x)) %>%
   #or use
   # map_df(reduce, coalesce) %>%
   bind_cols(., select(data_example, ends_with("extra")))
# A tibble: 3 x 5
#     aa    bb    cc aa_extra bb_extra
#  <dbl> <dbl> <dbl>    <dbl>    <dbl>
#1     1     1     6        2        1
#2     2     2     7        2        2
#3    NA     2     8       NA        3
like image 42
akrun Avatar answered Sep 28 '22 01:09

akrun


Using data.table for melt and dcast since I can never remember how spread and gather work

library(data.table)
library(dplyr)

data_example %>% 
  mutate(row = row_number()) %>% 
  melt('row') %>% 
  group_by(g = sub('_*$', '', variable), row) %>% 
  mutate(value = reduce(value, coalesce)) %>% 
  dcast(row ~ variable) %>% 
  select(-row)

#   aa bb cc aa_extra bb_extra
# 1  1  1  6        1        1
# 2  2  2  7        2        2
# 3 NA  2  8       NA        2
like image 43
IceCreamToucan Avatar answered Sep 28 '22 00:09

IceCreamToucan