I have a difficulty using grepl with regex.
Here is a small example:
I have a character vector:
text <- c(
"D_Purpose__Repairs" ,
"Age" ,
"F_Job"
)
And I want to select the words that start with D_ or F_. So I write:
grepl("\\>D_.+ | \\>F_.+", text)
grepl("\\D_.+ | \\F_.+", text)
grepl("\\^D_.+ | \\^F_.+", text)
However this returns:
[1] FALSE FALSE FALSE
Could you help me understand what I am doing wrong and how should I correct my code?
Your advice will be appreciated.
You don't need to (and must not) escape the caret character with backslashes, and you can't put extra whitespace in your regex around the |. This works as you intend:
> grepl("^D_.+|^F_.+", text)
[1] TRUE FALSE TRUE
Some comments on your patterns:
\>D_.+ | \>F_.+ - here, \> matches the end of word position while the actual position here is a start of a word (so, you might want to try with \<'). Also, the spaces around|are meaningful, you should not add them unless you use aperl=TRUEwith a(?x)` modifier.
\D_.+ | \F_.+ is a malformed patter since \F is an unknown regex escape. \D matches any char but a digit, and is clearly something you did not expect.
\^D_.+ | \^F_.+ is the closest, but there are redundant spaces again, and the escaped ^ match literal caret symbols. If you do not escape carets they match the start of string positions.
Now, the most efficient pattern here is
grepl("^[DF]_.+", text)
Meaning:
^ - start of string anchor[DF] - either D or F letters_ - a literal underscore.+ - any 1+ chars as many as possible up to the end of the string.If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With