Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Search for string across entire row of a tibble?

I'm trying to clean up a sample information sheet that comes from a lot of different groups and thus the treatment information I care about may be located in any number of different columns. Here's an abstracted example:

sample_info = tribble(
  ~id, ~could_be_here, ~or_here,    ~or_even_in_this_one,
  1,   NA,             "not_me",    "find_me_other_stuff",
  2,   "Extra_Find_Me", NA,         "diff_stuff",
  3,   NA,              "Find_me",  NA,
  4,   NA,              "not_here", "not_here_either"
)

where I would want to find "find_me" 1) case-insensitively, 2) where it could be in any column, and 3) where it could be as part of a larger string. I want to create one column that's TRUE or FALSE for whether "find_me" was found in any columns. How can I do this? (I've thought of uniteing all columns and then just running a str_detect on that mess, but there must be a less hacky way, right?)

To be clear, I would want a final tibble that's equivalent to sample_info %>% mutate(find_me = c(TRUE, TRUE, TRUE, FALSE)).

I expect that I would want to use something like stringr::str_detect(., regex('find_me', ignore_case = T)) and pmap_lgl(any(c(...) <insert logic check>)) like in the similar cases linked below, but I'm not sure how to put them together into a mutate-compatible statement.

Things I've looked through:
Row-wise operation to see if any columns are in any other list

R: How to ignore case when using str_detect?

in R, check if string appears in row of dataframe (in any column)

like image 283
GenesRus Avatar asked Dec 18 '22 11:12

GenesRus


1 Answers

One dplyr and purrr option could be:

sample_info %>%
 mutate(find_me = pmap_lgl(across(-id), ~ any(str_detect(c(...), regex("find_me", ignore_case = TRUE)), na.rm = TRUE)))

     id could_be_here or_here  or_even_in_this_one find_me
  <dbl> <chr>         <chr>    <chr>               <lgl>  
1     1 <NA>          not_me   find_me_other_stuff TRUE   
2     2 Extra_Find_Me <NA>     diff_stuff          TRUE   
3     3 <NA>          Find_me  <NA>                TRUE   
4     4 <NA>          not_here not_here_either     FALSE

Or with just using dplyr:

sample_info %>%
 rowwise() %>%
 mutate(find_me = any(str_detect(c_across(-id), regex("find_me", ignore_case = TRUE)), na.rm = TRUE))
like image 69
tmfmnk Avatar answered Dec 20 '22 01:12

tmfmnk