Filter according to partial match of string variable in R

Question

I have a data-frame with string variable column "disease". I want to filter the rows with partial match "trauma" or "Trauma". I am currently done the following using dplyr and stringr:

trauma_set <- df %>% filter(str_detect(disease, "trauma|Trauma"))

But the result also includes "Nontraumatic" and "nontraumatic". How can I filter only "trauma, Trauma, traumatic or Traumatic" without including nontrauma or Nontrauma? Also, is there a way I can define the string to detect without having to specify both uppercase and lowercase version of the string (as in both trauma and Trauma)?

akrun · Accepted Answer

If we want to specify the word boundary, use \b at the start. Also, for different cases, we can use ignore_case = TRUE by wrapping with modifiers

library(dplyr)
library(stringr)
out <- df %>%
        filter(str_detect(disease, regex("\btrauma", ignore_case = TRUE)))

sum(str_detect(out$disease, regex("^Non", ignore_case = TRUE)))
#[1] 0

data

set.seed(24)
df <- data.frame(disease = sample(c("Nontraumatic", "Trauma", 
 "Traumatic", "nontraumatic", "traumatic", "trauma"), 50 ,
        replace = TRUE), value = rnorm (50))

Filter according to partial match of string variable in R

Tags:

r

dplyr

stringr

dc.tv

1 Answers

data

akrun

Recent Activity

Donate For Us

Filter according to partial match of string variable in R

Tags:

r

dplyr

stringr

dc.tv

1 Answers

data

akrun

Related questions

Recent Activity

Donate For Us