I have a data-frame with string variable column "disease". I want to filter the rows with partial match "trauma" or "Trauma". I am currently done the following using dplyr
and stringr
:
trauma_set <- df %>% filter(str_detect(disease, "trauma|Trauma"))
But the result also includes "Nontraumatic" and "nontraumatic". How can I filter only "trauma, Trauma, traumatic or Traumatic" without including nontrauma or Nontrauma? Also, is there a way I can define the string to detect without having to specify both uppercase and lowercase version of the string (as in both trauma and Trauma)?
If we want to specify the word boundary, use \\b
at the start. Also, for different cases, we can use ignore_case = TRUE
by wrapping with modifiers
library(dplyr)
library(stringr)
out <- df %>%
filter(str_detect(disease, regex("\\btrauma", ignore_case = TRUE)))
sum(str_detect(out$disease, regex("^Non", ignore_case = TRUE)))
#[1] 0
set.seed(24)
df <- data.frame(disease = sample(c("Nontraumatic", "Trauma",
"Traumatic", "nontraumatic", "traumatic", "trauma"), 50 ,
replace = TRUE), value = rnorm (50))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With