I wonder if there is a comparison between the features of various regex metacharacters in various implementations.
The sort of thing I am looking for is a table like
Language Perl sed
grouping ( ) \( \)
Languages I am interested in are Perl, Sed Java Javascript
Regular expressions (RE), as defined by POSIX, come in two flavors: extended regular expressions (ERE) and basic regular expressions (BRE).
As a result, broadly speaking, there are three types of regex engines: DFA (POSIX or not—similar either way) Traditional NFA (most common: Perl, . NET, PHP, Java, Python, . . . )
R supports two regular expression flavors: POSIX 1003.2 and Perl. Regular expression functions in R contain two arguments: extended , which defaults to TRUE , and perl , which defaults to FALSE .
Avoid coding in regex if you can In programming, only use regular expressions as a last resort. Don't solve important problems with regex. regex is expensive – regex is often the most CPU-intensive part of a program. And a non-matching regex can be even more expensive to check than a matching one.
There's a comprehensive comparison page here: Regular Expression Flavor Comparison.
Some languages implement a particular style, so look up your language on that page and determine which column to look at. For example, JavaScript will be under ECMA. For sed it depends on whether you're using UNIX or Linux (from the page):
The sed UNIX tool uses POSIX BRE. Linux usually ships with the GNU implementation, which use "GNU BRE".
The Wikipedia comparison of regular expression engines chart is comprehensive and easy to understand.
There is also Richard Kettlewell's regexp syntax summary.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With