Do the "control characters" used in regular expressions differ a lot among different implementations of regex parsers (eg. regex in Ruby, Java, C#, sed etc.).
For example, in Ruby, the \D
means not a digit
; does it mean the same in Java, C# and sed?
I guess what I'm asking is, is there a "standard" for regex'es that all regex parsers support?
If not, is there some common subset that should be learned and mastered (and then learn the parser-specific ones as they're encountered) ?
See the list of basic syntax on regular-expressions.info.
And a comparison of the different "flavors".
There is a common core which is very simple. It corresponds to the regular expressions as implemented in the original software tools such as ed, grep, sed, and awk. This is worth learning, because the other formats are all supersets of this one.†
. match any character
[abc] match a, b, or c
[^abc] match a character other than a, b, or c
[a-c] match the range from a to c
^ match the begininning of the line
$ match the end of the line
* match zero or more of the preceding character
\(...\) group for use as a back-reference
† I've left out Posix bracket expressions because no one uses them and they aren't in the subset. The parens are by default magic except in the classic expressions.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With