Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex: ?: notation (Question mark and colon notation) [duplicate]

Tags:

java

regex

I have the following Java regex, which I didn't write and I am trying to modify:

^class-map(?:(\\s+match-all)|(\\s+match-any))?(\\s+[\\x21-\\x7e]{1,40})$            ^                                 ^ 

It's similar to this one.

Note the first question mark. Does it mean that the group is optional? There is already a question mark after the corresponding ). Does the colon have a special meaning in regex?

The regex compiles fine, and there are already JUnit tests that show how it works. It's just that I'm a bit confused about why the first question mark and colon are there.

like image 292
BJ Dela Cruz Avatar asked Jul 17 '12 21:07

BJ Dela Cruz


People also ask

What does ?: Mean in regex?

It means that it is not capturing group.

What does question mark colon mean in regex?

The question mark and the colon after the opening parenthesis are the syntax that creates a non-capturing group. The regex Set(Value)? matches Set or SetValue. In the first case, the first (and only) capturing group remains empty. In the second case, the first capturing group matches Value.

How do you escape the colon in regex?

By placing - at the start or the end of the class, it matches the literal "-" . As mentioned in the comments by Keoki Zee, you can also escape the - inside the class, but most people simply add it at the end. You can also escape the hyphen with a backslash, [a\-z] .

What does question mark at end of regex mean?

A regular expression followed by a plus sign (+) matches one or more occurrences of the regular expression. A regular expression followed by a question mark (?) matches zero or one occurrences of the regular expression.


2 Answers

(?: starts a non-capturing group. It's no different to ( unless you're retrieving groups from the regex after use. See What is a non-capturing group? What does a question mark followed by a colon (?:) mean?.

like image 61
ryanp Avatar answered Sep 18 '22 12:09

ryanp


A little late to this thread - just to build on ryanp's answer.

Assuming you have the string aaabbbccc

Regular Expression

(a)+(b)+(c)+ 

This would give you the following 3 groups that matched:

['a', 'b', 'c'] 

Regular Expression with non-capturing parenthesis

Use the ?: in the first group

(?:a)+(b)+(c)+ 

and you would get the following groups that matched:

['b', 'c'] 

Hence why it is called "non-capturing parenthesis"

Example use case:

Sometime you use parenthesis for other things. For example to set the bounds of the | or operator:

"New (York|Jersey)" 

In this case, you are only using the parenthesis for the or | switch, and you don't really want to capture this data. Use the non-capturing parenthesis to indicate that:

"New (?:York|Jersey)" 
like image 21
14 revs, 12 users 16% Avatar answered Sep 21 '22 12:09

14 revs, 12 users 16%