Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Mutate to concatenate columns that contain a specific string in their names

I'm trying to create a new column that concatenates all values of a specific set of columns that contain a specific string in their names, with a semi-colon separator. I'm working within dplyr, so I'm seeking a tidyverse solution.

I attempted to use grepl() in combination with mutate(), case_when(), and paste() to identify the columns that contained the string I wanted in their names ("Games") and concatenate their contents together into a new column. When that failed, I attempted to use str_detect instead, but that didn't work out.

As far as I can tell, my issue is that I can't correctly instruct the code to evaluate all column names and then return the ones that have a string that contains my specified pattern. I've tried using contains("Games"), colnames(.x), and other variations on those arguments. I know I could do this if I specifically named every column I wished to paste together, but I prefer a relative solution that prevents me from typing out multiple names.

Thank you!

# Sample Data

test<-as_tibble(data.frame(`ID` = c("1","2","3"),
                           `Gender` = c("Female","Male","Non-Binary"),
                           `Games_Chess`=c("Chess",NA,"Chess"),
                           `Games_Clue`=c("Clue",NA,NA),
                           `Games_Scrabble`=c("Scrabble",NA,"Scrabble")))
# A tibble: 3 × 5
  ID    Gender     Games_Chess Games_Clue Games_Scrabble
  <chr> <chr>      <chr>       <chr>      <chr>         
1 1     Female     Chess       Clue       Scrabble      
2 2     Male       NA          NA         NA            
3 3     Non-Binary Chess       NA         Scrabble  

# Desired Output

ID    Gender     Games_Chess Games_Clue Games_Scrabble Games    
1     Female     Chess       Clue       Scrabble       Chess; Clue; Scrabble
2     Male       NA          NA         NA             NA
3     Non-Binary Chess       NA         Scrabble       Chess; Scrabble  

# Attempted Code 1

test<-test%>%
  mutate(`Games` = case_when(str_detect(colnames(test),"Games") ~ paste(.x, collapse = ";"), TRUE ~ NA))

# Error Code 1

Error in `mutate()`:
ℹ In argument: `Games = case_when(...)`.
Caused by error in `case_when()`:
! Failed to evaluate the right-hand side of formula 1.
Caused by error:
! object '.x' not found

# Attempted Code 2
test<-test%>%
  mutate(`Games` = case_when(grepl("Games",.) ~ paste(., collapse = ";"), TRUE ~ NA))

# Error Code 2

Error in `mutate()`:
ℹ In argument: `Games = case_when(...)`.
Caused by error:
! `Games` must be size 3 or 1, not 4.
Run `rlang::last_trace()` to see where the error occurred.
like image 753
Mary Rachel Avatar asked Oct 20 '25 02:10

Mary Rachel


2 Answers

Though not entirely dplyr-based, the following solution does still fall under the tidyverse using tidyr:

test %>%
  tidyr::unite(
    # Name of new column
    col = "Games",
    # Select columns to unite using tidy-select syntax
    dplyr::starts_with("Games"),
    # Specify semi-colon as separator
    sep = "; ",
    # Keep original Games_* columns
    remove = FALSE,
    # Remove NA's prior to concatenation
    na.rm = TRUE
  )

Which results in the following output:

#>   ID     Gender                 Games Games_Chess Games_Clue Games_Scrabble
#> 1  1     Female Chess; Clue; Scrabble       Chess       Clue       Scrabble
#> 2  2       Male                              <NA>       <NA>           <NA>
#> 3  3 Non-Binary       Chess; Scrabble       Chess       <NA>       Scrabble

Edit: optionally, if you'd like the new column to be at the very end of your data, then you can add the following code at the end using the %>% pipe operator to move the new column to the end:

dplyr::relocate(
    Games,
    .after = dplyr::everything()
  )
like image 132
cristian-vargas Avatar answered Oct 21 '25 16:10

cristian-vargas


rowwise() is usually considered slow, but it's perfectly fine to use unless you're dealing with large amounts of data

library(dplyr)

test |>
  rowwise() |>
  mutate(Games = paste(na.omit(c_across(c(starts_with("Games")))), collapse = "; ")) |>
  ungroup() |>
  mutate(Games = sub("^$", NA, Games))

#> # A tibble: 3 × 6
#>   ID    Gender     Games_Chess Games_Clue Games_Scrabble Games                
#>   <chr> <chr>      <chr>       <chr>      <chr>          <chr>                
#> 1 1     Female     Chess       Clue       Scrabble       Chess; Clue; Scrabble
#> 2 2     Male       <NA>        <NA>       <NA>           <NA>                 
#> 3 3     Non-Binary Chess       <NA>       Scrabble       Chess; Scrabble

You can also define a complicated function separately for readability. starts_with() is nice but I'll usually use matches() to use a regex.

paste_func <- function(vec) {
  na.omit(vec) |>
    paste(collapse = "; ") |>
    sub("^$", NA, x = _)
}

test |>
  rowwise() |>
  mutate(Games = paste_func(c_across(matches("^Games")))) |>
  ungroup()

#> # A tibble: 3 × 6
#>   ID    Gender     Games_Chess Games_Clue Games_Scrabble Games                
#>   <chr> <chr>      <chr>       <chr>      <chr>          <chr>                
#> 1 1     Female     Chess       Clue       Scrabble       Chess; Clue; Scrabble
#> 2 2     Male       <NA>        <NA>       <NA>           <NA>                 
#> 3 3     Non-Binary Chess       <NA>       Scrabble       Chess; Scrabble

For larger datasets where rowwise() might get a little slow, you can coerce a portion of the data frame to a matrix with apply() and assign results accordingly. Since apply() takes a matrix you'll want to only supply data of all the same class.

result <- test
cols_to_paste <- colnames(result)[grepl("^Games", colnames(result))]
result$Games <- apply(result[, cols_to_paste], 1, FUN = paste_func)

result

#> # A tibble: 3 × 6
#>   ID    Gender     Games_Chess Games_Clue Games_Scrabble Games                
#>   <chr> <chr>      <chr>       <chr>      <chr>          <chr>                
#> 1 1     Female     Chess       Clue       Scrabble       Chess; Clue; Scrabble
#> 2 2     Male       <NA>        <NA>       <NA>           <NA>                 
#> 3 3     Non-Binary Chess       <NA>       Scrabble       Chess; Scrabble

I usually find this more trouble than it's worth but you can also pivot_longer() to make column operations easier and then put it back to its original wide format with pivot_wider()

test |>
  tidyr::pivot_longer(matches("^Games")) |>
  group_by(ID) |>
  mutate(Games = paste(na.omit(value), collapse = "; ")) |>
  ungroup() |>
  tidyr::pivot_wider() |>
  mutate(Games = sub("^$", NA, Games)) |>
  relocate(all_of(colnames(test)))

#> # A tibble: 3 × 6
#>   ID    Gender     Games_Chess Games_Clue Games_Scrabble Games                
#>   <chr> <chr>      <chr>       <chr>      <chr>          <chr>                
#> 1 1     Female     Chess       Clue       Scrabble       Chess; Clue; Scrabble
#> 2 2     Male       <NA>        <NA>       <NA>           <NA>                 
#> 3 3     Non-Binary Chess       <NA>       Scrabble       Chess; Scrabble
like image 45
knitz3 Avatar answered Oct 21 '25 16:10

knitz3



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!