For PCRE Regular Expressions, what is the difference between [abc] and (a|b|c)?
The patterns in your question match the same text. In terms of implementation, they correspond to different automata and side effects (i.e., whether they capture substrings).
In a comment below, Garrett Albright points out a subtle distinction. Whereas (.|\n) matches any character, [.\n] matches either a literal dot or a newline. Although dot is no longer special inside a character class, other characters such as -, ^, and ] along with sequences such as [:lower:] take special meanings inside a character class. Care is necessary to preserve special semantics from one context to the other, but sometimes it isn’t possible such as in the case of \1 as an archaic way of writing $1 outside a character class. Inside a character class, \1 always matches the character SOH.
Character classes ([...]) are optimized for matching one out of some set of characters, and alternatives (x|y) allow for more general choices of varying lengths. You will tend to see better performance if you keep these design principles in mind. Regex implementations transform source code such as /[abc]/ into finite-state automata, usually NFAs. What we think of as regex engines are more-or-less bookkeepers that assist execution of those target state machines. The sufficiently smart regex compiler will generate the same machine code for equivalent regexes, but this is difficult and expensive in the general case because of the lurking exponential complexity.
For an accessible introduction to the theory behind regexes, read “How Regexes Work” by Mark Dominus. For deeper study, consider An Introduction to Formal Languages and Automata by Peter Linz.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With