Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can Anyone Explain ?: in regular expression [duplicate]

Tags:

regex

tcl

TCL: Can Anyone Explain ?: in regular expression

I am getting confusion between ? and ?: .

? means preceding character may or may not be present.

Then I am not understanding what (?:) indicates.

Can Anyone please Explain this.

([0-9]+(?:\.[0-9]*)?)
like image 768
user2742564 Avatar asked Sep 14 '13 08:09

user2742564


People also ask

What does ?: Do in regular expressions?

It indicates that the subpattern is a non-capture subpattern. That means whatever is matched in (?:\w+\s) , even though it's enclosed by () it won't appear in the list of matches, only (\w+) will.

How can I tell if two regex is same?

We say that two regular expressions R and S are equivalent if they describe the same language. In other words, if L(R) = L(S) for two regular expressions R and S then R = S.

What can regular expressions not do?

In short regular expressions does not allow the pattern to refer to itself. You cannot say: at this point in the syntax match the whole pattern again. To put it another way, regular expressions only matches linearly, it does not contain a stack which would allow it to keep track of how deep it is an a nested pattern.


1 Answers

Suppose, you were trying to look for something like ABC123 or ABC123.45 in your input String and you wanted to capture the letters and the numbers separately. You would use a regex (a bit similar to yours) like

([A-Z]+)([0-9]+(\.[0-9]+)?)

The above regex would match ABC123.45 and provide three groups as well that represent sub-parts of the whole match and are decided by where you put those () brackets. So, given our regex (without using ?:) we got

Group 1 = ABC
Group 2 = 123.45
Group 3 = .45

Now, it may not make much sense to capture the decimal portion always and it actually has already been captured in our Group 2 as well. So, how would you make that group () non capturing? Yes, by using ?: at the start as

([A-Z]+)([0-9]+(?:\.[0-9]+)?)

Now, you only get the two desired groups

Group 1 = ABC
Group 2 = 123.45

Notice, I also changed the last part of the regex from \.[0-9]* to \.[0-9]+. This would prevent a match on 123. i.e. numbers without a decimal part but still having a dot.

like image 50
Ravi K Thapliyal Avatar answered Oct 20 '22 05:10

Ravi K Thapliyal