Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R-regex: match strings not beginning with a pattern

Tags:

I'd like to use regex to see if a string does not begin with a certain pattern. While I can use: [^ to blacklist certain characters, I can't figure out how to blacklist a pattern.

> grepl("^[^abc].+$", "foo")
[1] TRUE
> grepl("^[^abc].+$", "afoo")
[1] FALSE

I'd like to do something like grepl("^[^(abc)].+$", "afoo") and get TRUE, i.e. to match if the string does not start with abc sequence.

Note that I'm aware of this post, and I also tried using perl = TRUE, but with no success:

> grepl("^((?!hede).)*$", "hede", perl = TRUE)
[1] FALSE
> grepl("^((?!hede).)*$", "foohede", perl = TRUE)
[1] FALSE

Any ideas?

like image 395
aL3xa Avatar asked Dec 08 '11 21:12

aL3xa


2 Answers

Yeah. Put the zero width lookahead /outside/ the other parens. That should give you this:

> grepl("^(?!hede).*$", "hede", perl = TRUE)
[1] FALSE
> grepl("^(?!hede).*$", "foohede", perl = TRUE)
[1] TRUE

which I think is what you want.

Alternately if you want to capture the entire string, ^(?!hede)(.*)$ and ^((?!hede).*)$ are both equivalent and acceptable.

like image 169
Dan Avatar answered Nov 09 '22 11:11

Dan


There is now (years later) another possibility with the stringr package.

library(stringr)

str_detect("dsadsf", "^abc", negate = TRUE)
#> [1] TRUE

str_detect("abcff", "^abc", negate = TRUE)
#> [1] FALSE

Created on 2020-01-13 by the reprex package (v0.3.0)

like image 40
pasipasi Avatar answered Nov 09 '22 11:11

pasipasi