grepl with regex

Question

I have a difficulty using grepl with regex.

Here is a small example:

I have a character vector:

text <- c(

  "D_Purpose__Repairs" ,
  "Age" ,
  "F_Job"  
)

And I want to select the words that start with D_ or F_. So I write:

grepl("\>D_.+ | \>F_.+", text)

grepl("\D_.+ | \F_.+", text)

grepl("\^D_.+ | \^F_.+", text)

However this returns:

[1] FALSE FALSE FALSE

Could you help me understand what I am doing wrong and how should I correct my code?

Your advice will be appreciated.

Terran Melconian · Accepted Answer

You don't need to (and must not) escape the caret character with backslashes, and you can't put extra whitespace in your regex around the |. This works as you intend:

> grepl("^D_.+|^F_.+", text)
[1]  TRUE FALSE  TRUE

Wiktor Stribiżew · Answer

Some comments on your patterns:

\>D_.+ | \>F_.+ - here, \> matches the end of word position while the actual position here is a start of a word (so, you might want to try with \<'). Also, the spaces around|are meaningful, you should not add them unless you use aperl=TRUEwith a(?x)` modifier.
\D_.+ | \F_.+ is a malformed patter since \F is an unknown regex escape. \D matches any char but a digit, and is clearly something you did not expect.
\^D_.+ | \^F_.+ is the closest, but there are redundant spaces again, and the escaped ^ match literal caret symbols. If you do not escape carets they match the start of string positions.

Now, the most efficient pattern here is

grepl("^[DF]_.+", text)

Meaning:

^ - start of string anchor
[DF] - either D or F letters
_ - a literal underscore
.+ - any 1+ chars as many as possible up to the end of the string.

grepl with regex

Tags:

regex

r

grepl

rf7

2 Answers

Terran Melconian

Wiktor Stribiżew

Recent Activity

Donate For Us

grepl with regex

Tags:

regex

r

grepl

rf7

2 Answers

Terran Melconian

Wiktor Stribiżew

Related questions

Recent Activity

Donate For Us