I'm trying to create some general code to ease the usage of regexes, and thinking how to implement the OR function.
The title is pretty accurate (ex1,ex2,ex3 are any regular expressions). Not considering grouping, what's the difference between:
"(ex1)|(ex2)|(ex3)"
and
"[(ex1)(ex2)(ex3)]"
These both should be an or relation between the named regexes, i just might be missing something. Any way one is more efficient than the other?
(ex1)|(ex2)|(ex3)
matches ex1
(available in group 1), ex2
(available in group 2) or ex3
(available in group 3)
Debuggex Demo
[(ex1)(ex2)(ex3)]
matches (
, e
, x
, 1
, 2
, 3
or )
Debuggex Demo
(ex1)|(ex2)|(ex3)
Here you are capturing ex1
, ex2
and ex3
.
Here:
[(ex1)(ex2)(ex3)]
(
and )
are quoted and treated as is since they're enclosed in [
and ]
(character classes), it matches (
, )
, e
, x
, 1
, 2
and 3
.
Note that it's equivalent to (the order is not important):
[ex123)(]
Important notes on character sets:
The caret (^) and the hyphen (-) can be included as is. If you want to include hyphen, you should place it in the very beginning of the character class. If you want to match the caret as a part of the character set, you should not put it as the first character:
[^]x]
matches anything that's not ]
and x
where []^x]
matches ]
, ^
or x
[a-z]
matches all letters from a
to z
where [-az]
matches -
, a
and z
They're fundamentally different.
(ex1)|(ex2)|(ex3)
defines a series of alternating capture groups for the literal text ex1
, ex2
, and ex3
. That is, either ex1
, if present, will be captured in the first capture group; or ex2
, if present, will be captured in a second capture group; or ex3
, if present, will be captured in a third group. (This would be a fairly odd expression, a more likely one would be (ex1|ex2|ex3)
, which matches and captures either ex1
, ex2
, or ex3
.)
[(ex1)(ex2)(ex3)]
defines a character class that will match any of the following characters (just one character): (ex1)23
. There are no capture groups, the text within the []
is treated literally.
The Pattern
class documentation goes into detail about how patterns work.
In the first regex: (ex1)|(ex2)|(ex3)
, you are going to match three groups denoted by the parenthesis (i.e. ex1
, ex2
, ex3
), so you will get results that will match whatever ex1
regex matches, whatever ex2
regex matches and whatever ex3
regex matches.
Whereas in the second: [(ex1)(ex2)(ex3)]
there will be no groups (as you are using []
brackets and parenthesis will be treated as characters. So you will get everything that matches (ex1)(ex2)(ex3)
expression.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With