Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

grepl with regex

Tags:

regex

r

grepl

I have a difficulty using grepl with regex.

Here is a small example:

I have a character vector:

text <- c(

  "D_Purpose__Repairs" ,
  "Age" ,
  "F_Job"  
)

And I want to select the words that start with D_ or F_. So I write:

grepl("\\>D_.+ | \\>F_.+", text)

grepl("\\D_.+ | \\F_.+", text)

grepl("\\^D_.+ | \\^F_.+", text)

However this returns:

[1] FALSE FALSE FALSE

Could you help me understand what I am doing wrong and how should I correct my code?

Your advice will be appreciated.

like image 669
rf7 Avatar asked Feb 06 '26 19:02

rf7


2 Answers

You don't need to (and must not) escape the caret character with backslashes, and you can't put extra whitespace in your regex around the |. This works as you intend:

> grepl("^D_.+|^F_.+", text)
[1]  TRUE FALSE  TRUE
like image 71
Terran Melconian Avatar answered Feb 08 '26 08:02

Terran Melconian


Some comments on your patterns:

  • \>D_.+ | \>F_.+ - here, \> matches the end of word position while the actual position here is a start of a word (so, you might want to try with \<'). Also, the spaces around|are meaningful, you should not add them unless you use aperl=TRUEwith a(?x)` modifier.

  • \D_.+ | \F_.+ is a malformed patter since \F is an unknown regex escape. \D matches any char but a digit, and is clearly something you did not expect.

  • \^D_.+ | \^F_.+ is the closest, but there are redundant spaces again, and the escaped ^ match literal caret symbols. If you do not escape carets they match the start of string positions.

Now, the most efficient pattern here is

grepl("^[DF]_.+", text)

Meaning:

  • ^ - start of string anchor
  • [DF] - either D or F letters
  • _ - a literal underscore
  • .+ - any 1+ chars as many as possible up to the end of the string.
like image 21
Wiktor Stribiżew Avatar answered Feb 08 '26 08:02

Wiktor Stribiżew



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!