Perl has a lovely modifier /x
that ignores whitespace in regular expressions. That is to say not that it matches regardless of whitespace but rather that it omits whitespace in the interpretation of the regex unless escaped.
I.e. ^x[0-7][x-z][ABCpuq*]*$
could be written equivalently but much more readably as ^x [0-7] [x-z] [ABCpuq*]*$
in /x
mode.
grep
and its ilk in R seem to have no such mode, but given their Perl compatibility, is there an option to pass? I've tried a few options but no such luck.
> grepl( "^x[0-7][x-z][ABCpuq*]*$", "x5yuuA" )
[1] TRUE
> grepl( "^x [0-7] [x-z][ABCpuq*]*$", "x5yuuA" )
[1] FALSE
> grepl( "^x [0-7] [x-z][ABCpuq*]*$", "x5yuuA", perl=TRUE )
[1] FALSE
> grepl( "^x [0-7] [x-z][ABCpuq*]*$/x", "x5yuuA", perl=TRUE )
[1] FALSE
Secondary question: How directly do R's Perl-style regexes rely on the C PCRE library? There seems to be a PCRE_Extended
setting bit that turns on ignoring whitespace.
Free-Spacing Mode
In R, to use free-spacing mode for an entire expression, pop the (?x)
mode modifier at the beginning of your regex in PCRE mode (perl=TRUE
).
Example:
grepl("(?x) # free spacing\r\n\\d # a digit\r\n[bc] # b or c", subject, perl=TRUE);
The (?x) modifier works in most regex flavors. Some exceptions: JavaScript, MySQL, Oracle, VBScript, XPath.
Perl mode and PCRE
How closely does Perl mode rely on PCRE? Entirely. (That's a good thing. See below.)
From R manual:
The perl = TRUE argument to grep, regexpr, gregexpr, sub, gsub and strsplit switches to the PCRE library that implements regular expression pattern matching using the same syntax and semantics as Perl 5.10, with just a few differences.
Some Refinements
(?x)
at any point in the regex(?-x)
(?x: \w \d)
In Praise of PCRE
Having access to PCRE is a good thing.
PCRE is one of the contenders for the title of very best Perl-style engine—along with .NET, Matthew Barnett's regex
module for Python, and Perl itself. It is widely used in high-visibility environments (Apache, PHP, Notepad++) so it gets a lot of attention. Among other treats, it gives you access to exotic features such as:
\K
to "Keep Out" what has been matched so far from the returned match(*SKIP)(*F)
and others(?(DEFINE)...
, which can help you refactor a complex regexWhat's missing in PCRE?
balancing groups
. That will probably never happen because balancing groups are often seen as recursion's poor brother... However, it allows you to do other things, such as easily setting up counters.regex
module (can't comment as I haven't used that feature).If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With