Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

regex: "(^|)" vs "(|^)"

Tags:

regex

r

I have a very special question concerning regular expressions in R:

grepl("(|^)over","stackoverflow")
# [1] TRUE

grepl("(^|)over","stackoverflow")
# [1] FALSE

grepl("(^|x|)over","stackoverflow")
# [1] FALSE

grepl("(x|^|)over","stackoverflow")
# [1] FALSE

grepl("(x||^)over","stackoverflow")
# [1] TRUE

Why do not all those expressions evaluate to TRUE?

like image 604
Daniel Gerigk Avatar asked Mar 09 '16 23:03

Daniel Gerigk


People also ask

What does *$ mean in regex?

*$ means - match, from beginning to end, any character that appears zero or more times. Basically, that means - match everything from start to end of the string. This regex pattern is not very useful. Let's take a regex pattern that may be a bit useful.

What would the regular expression '\ S+ S +' match?

The Difference Between \s and \s+ The plus sign + is a greedy quantifier, which means one or more times. For example, expression X+ matches one or more X characters. Therefore, the regular expression \s matches a single whitespace character, while \s+ will match one or more whitespace characters.

What does \s mean in regex?

\s stands for “whitespace character”. Again, which characters this actually includes, depends on the regex flavor. In all flavors discussed in this tutorial, it includes [ \t\r\n\f]. That is: \s matches a space, a tab, a carriage return, a line feed, or a form feed.


1 Answers

POSIX regular expressions actually should make all those True. It appears that R uses a slightly modified version of Ville Laurikari's TRE library that doesn't quite follow the standard. I'd follow @rawr's recommendations and use perl = TRUE for more compliant regular expressions.

See also: When both halves of an OR regex group match, is it defined which will be chosen?

like image 102
Allen Luce Avatar answered Oct 22 '22 22:10

Allen Luce