I'm trying to create a new column that concatenates all values of a specific set of columns that contain a specific string in their names, with a semi-colon separator. I'm working within dplyr
, so I'm seeking a tidyverse
solution.
I attempted to use grepl()
in combination with mutate()
, case_when()
, and paste()
to identify the columns that contained the string I wanted in their names ("Games"
) and concatenate their contents together into a new column. When that failed, I attempted to use str_detect
instead, but that didn't work out.
As far as I can tell, my issue is that I can't correctly instruct the code to evaluate all column names and then return the ones that have a string that contains my specified pattern. I've tried using contains("Games")
, colnames(.x)
, and other variations on those arguments. I know I could do this if I specifically named every column I wished to paste together, but I prefer a relative solution that prevents me from typing out multiple names.
Thank you!
# Sample Data
test<-as_tibble(data.frame(`ID` = c("1","2","3"),
`Gender` = c("Female","Male","Non-Binary"),
`Games_Chess`=c("Chess",NA,"Chess"),
`Games_Clue`=c("Clue",NA,NA),
`Games_Scrabble`=c("Scrabble",NA,"Scrabble")))
# A tibble: 3 × 5
ID Gender Games_Chess Games_Clue Games_Scrabble
<chr> <chr> <chr> <chr> <chr>
1 1 Female Chess Clue Scrabble
2 2 Male NA NA NA
3 3 Non-Binary Chess NA Scrabble
# Desired Output
ID Gender Games_Chess Games_Clue Games_Scrabble Games
1 Female Chess Clue Scrabble Chess; Clue; Scrabble
2 Male NA NA NA NA
3 Non-Binary Chess NA Scrabble Chess; Scrabble
# Attempted Code 1
test<-test%>%
mutate(`Games` = case_when(str_detect(colnames(test),"Games") ~ paste(.x, collapse = ";"), TRUE ~ NA))
# Error Code 1
Error in `mutate()`:
ℹ In argument: `Games = case_when(...)`.
Caused by error in `case_when()`:
! Failed to evaluate the right-hand side of formula 1.
Caused by error:
! object '.x' not found
# Attempted Code 2
test<-test%>%
mutate(`Games` = case_when(grepl("Games",.) ~ paste(., collapse = ";"), TRUE ~ NA))
# Error Code 2
Error in `mutate()`:
ℹ In argument: `Games = case_when(...)`.
Caused by error:
! `Games` must be size 3 or 1, not 4.
Run `rlang::last_trace()` to see where the error occurred.
Though not entirely dplyr
-based, the following solution does still fall under the tidyverse
using tidyr
:
test %>%
tidyr::unite(
# Name of new column
col = "Games",
# Select columns to unite using tidy-select syntax
dplyr::starts_with("Games"),
# Specify semi-colon as separator
sep = "; ",
# Keep original Games_* columns
remove = FALSE,
# Remove NA's prior to concatenation
na.rm = TRUE
)
Which results in the following output:
#> ID Gender Games Games_Chess Games_Clue Games_Scrabble
#> 1 1 Female Chess; Clue; Scrabble Chess Clue Scrabble
#> 2 2 Male <NA> <NA> <NA>
#> 3 3 Non-Binary Chess; Scrabble Chess <NA> Scrabble
Edit: optionally, if you'd like the new column to be at the very end of your data, then you can add the following code at the end using the %>%
pipe operator to move the new column to the end:
dplyr::relocate(
Games,
.after = dplyr::everything()
)
rowwise()
is usually considered slow, but it's perfectly fine to use unless you're dealing with large amounts of data
library(dplyr)
test |>
rowwise() |>
mutate(Games = paste(na.omit(c_across(c(starts_with("Games")))), collapse = "; ")) |>
ungroup() |>
mutate(Games = sub("^$", NA, Games))
#> # A tibble: 3 × 6
#> ID Gender Games_Chess Games_Clue Games_Scrabble Games
#> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 1 Female Chess Clue Scrabble Chess; Clue; Scrabble
#> 2 2 Male <NA> <NA> <NA> <NA>
#> 3 3 Non-Binary Chess <NA> Scrabble Chess; Scrabble
You can also define a complicated function separately for readability. starts_with()
is nice but I'll usually use matches()
to use a regex.
paste_func <- function(vec) {
na.omit(vec) |>
paste(collapse = "; ") |>
sub("^$", NA, x = _)
}
test |>
rowwise() |>
mutate(Games = paste_func(c_across(matches("^Games")))) |>
ungroup()
#> # A tibble: 3 × 6
#> ID Gender Games_Chess Games_Clue Games_Scrabble Games
#> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 1 Female Chess Clue Scrabble Chess; Clue; Scrabble
#> 2 2 Male <NA> <NA> <NA> <NA>
#> 3 3 Non-Binary Chess <NA> Scrabble Chess; Scrabble
For larger datasets where rowwise()
might get a little slow, you can coerce a portion of the data frame to a matrix with apply()
and assign results accordingly. Since apply()
takes a matrix you'll want to only supply data of all the same class.
result <- test
cols_to_paste <- colnames(result)[grepl("^Games", colnames(result))]
result$Games <- apply(result[, cols_to_paste], 1, FUN = paste_func)
result
#> # A tibble: 3 × 6
#> ID Gender Games_Chess Games_Clue Games_Scrabble Games
#> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 1 Female Chess Clue Scrabble Chess; Clue; Scrabble
#> 2 2 Male <NA> <NA> <NA> <NA>
#> 3 3 Non-Binary Chess <NA> Scrabble Chess; Scrabble
I usually find this more trouble than it's worth but you can also pivot_longer()
to make column operations easier and then put it back to its original wide format with pivot_wider()
test |>
tidyr::pivot_longer(matches("^Games")) |>
group_by(ID) |>
mutate(Games = paste(na.omit(value), collapse = "; ")) |>
ungroup() |>
tidyr::pivot_wider() |>
mutate(Games = sub("^$", NA, Games)) |>
relocate(all_of(colnames(test)))
#> # A tibble: 3 × 6
#> ID Gender Games_Chess Games_Clue Games_Scrabble Games
#> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 1 Female Chess Clue Scrabble Chess; Clue; Scrabble
#> 2 2 Male <NA> <NA> <NA> <NA>
#> 3 3 Non-Binary Chess <NA> Scrabble Chess; Scrabble
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With