I need to extract whole sentences where middle word begins with a specific word in R. Below is the code which i am trying to use but not able to get the desired result. I am new to regular expression concept in R. I want to extract the sentences where middle word is 'arent'.
yy <- c("computers arent working", "arent not wkng","scanner arent good","arent scanner good")
m <- gregexpr('\\w arent ', yy)
regmatches(yy, m)
Above code does not gives what i want. My desired output is:
"computers arent working", "scanner arent good"
Thanks for your help!
I suggest
grep("\\w\\W+arent\\W+\\w", yy, value = TRUE)
grep will find all the strings that match the regex pattern (where a partial match is found), and will output the values themselves (as value is set to TRUE).
The regex pattern matches arent in-between word (\w) chars and only enclosed with 1+ non-word (\W+) chars.
Online R demo:
yy <- c("computers arent working", "arent not wkng","scanner arent good","arent scanner good")
grep("\\w\\W+arent\\W+\\w", yy, value = TRUE)
## => [1] "computers arent working" "scanner arent good"
If the word you seek to match MUST be enclosed with whitespace, replace \\W+ with \\s+ (1 or more whitespaces).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With