Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Need to extract whole sentences where middle word begins with a specific word in R

Tags:

regex

r

I need to extract whole sentences where middle word begins with a specific word in R. Below is the code which i am trying to use but not able to get the desired result. I am new to regular expression concept in R. I want to extract the sentences where middle word is 'arent'.

  yy <- c("computers arent working", "arent not wkng","scanner arent good","arent scanner good")
  m <- gregexpr('\\w arent ', yy)
  regmatches(yy, m)

Above code does not gives what i want. My desired output is:

 "computers arent working", "scanner arent good"

Thanks for your help!

like image 755
Kiwi Avatar asked Nov 17 '25 15:11

Kiwi


1 Answers

I suggest

grep("\\w\\W+arent\\W+\\w", yy, value = TRUE)

grep will find all the strings that match the regex pattern (where a partial match is found), and will output the values themselves (as value is set to TRUE).

The regex pattern matches arent in-between word (\w) chars and only enclosed with 1+ non-word (\W+) chars.

Online R demo:

yy <- c("computers arent working", "arent not wkng","scanner arent good","arent scanner good")
grep("\\w\\W+arent\\W+\\w", yy, value = TRUE)
## => [1] "computers arent working" "scanner arent good" 

If the word you seek to match MUST be enclosed with whitespace, replace \\W+ with \\s+ (1 or more whitespaces).

like image 136
Wiktor Stribiżew Avatar answered Nov 19 '25 08:11

Wiktor Stribiżew



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!