Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's the difference between "(ex1)|(ex2)|(ex3)" and "[(ex1)(ex2)(ex3)]"

Tags:

java

regex

I'm trying to create some general code to ease the usage of regexes, and thinking how to implement the OR function.

The title is pretty accurate (ex1,ex2,ex3 are any regular expressions). Not considering grouping, what's the difference between:

"(ex1)|(ex2)|(ex3)"

and

"[(ex1)(ex2)(ex3)]"

These both should be an or relation between the named regexes, i just might be missing something. Any way one is more efficient than the other?

like image 828
Attila Neparáczki Avatar asked Nov 27 '14 09:11

Attila Neparáczki


4 Answers

(ex1)|(ex2)|(ex3) matches ex1 (available in group 1), ex2 (available in group 2) or ex3 (available in group 3)

Regular expression visualization

Debuggex Demo


[(ex1)(ex2)(ex3)] matches (, e, x, 1, 2, 3 or )

Regular expression visualization

Debuggex Demo

like image 83
sp00m Avatar answered Nov 20 '22 18:11

sp00m


(ex1)|(ex2)|(ex3)

Here you are capturing ex1, ex2 and ex3.

Here:

[(ex1)(ex2)(ex3)]

( and ) are quoted and treated as is since they're enclosed in [ and ] (character classes), it matches (, ), e, x, 1, 2 and 3.

Note that it's equivalent to (the order is not important):

[ex123)(]

Important notes on character sets:

The caret (^) and the hyphen (-) can be included as is. If you want to include hyphen, you should place it in the very beginning of the character class. If you want to match the caret as a part of the character set, you should not put it as the first character:

  • [^]x] matches anything that's not ] and x where []^x] matches ], ^ or x
  • [a-z] matches all letters from a to z where [-az] matches -, a and z
like image 4
Maroun Avatar answered Nov 20 '22 16:11

Maroun


They're fundamentally different.

(ex1)|(ex2)|(ex3) defines a series of alternating capture groups for the literal text ex1, ex2, and ex3. That is, either ex1, if present, will be captured in the first capture group; or ex2, if present, will be captured in a second capture group; or ex3, if present, will be captured in a third group. (This would be a fairly odd expression, a more likely one would be (ex1|ex2|ex3), which matches and captures either ex1, ex2, or ex3.)

[(ex1)(ex2)(ex3)] defines a character class that will match any of the following characters (just one character): (ex1)23. There are no capture groups, the text within the [] is treated literally.

The Pattern class documentation goes into detail about how patterns work.

like image 1
T.J. Crowder Avatar answered Nov 20 '22 17:11

T.J. Crowder


In the first regex: (ex1)|(ex2)|(ex3), you are going to match three groups denoted by the parenthesis (i.e. ex1, ex2, ex3), so you will get results that will match whatever ex1 regex matches, whatever ex2 regex matches and whatever ex3 regex matches.

Whereas in the second: [(ex1)(ex2)(ex3)] there will be no groups (as you are using [] brackets and parenthesis will be treated as characters. So you will get everything that matches (ex1)(ex2)(ex3) expression.

like image 1
syntagma Avatar answered Nov 20 '22 17:11

syntagma