Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does {m}{n} ("exactly n times" twice) work?

Tags:

java

regex

So, some way or another (playing around), I found myself with a regex like \d{1}{2}.

Logically, to me, it should mean:

(A digit exactly once) exactly twice, i.e. a digit exactly twice.

But it, in fact, appears to just mean "a digit exactly once" (thus ignoring the {2}).

String regex = "^\\d{1}{2}$"; // ^$ to make those not familiar with 'matches' happy System.out.println("1".matches(regex)); // true System.out.println("12".matches(regex)); // false 

Similar results can be seen using {n}{m,n} or similar.

Why does this happen? Is it explicitly stated in regex / Java documentation somewhere or is it just a decision Java developers made on-the-fly or is it maybe a bug?

Or is it in fact not ignored and it actually means something else entirely?

Not that it matters much, but it's not across-the-board regex behaviour, Rubular does what I expect.

Note - the title is mainly for searchability for users who want to know how it works (not why).

like image 452
Bernhard Barker Avatar asked Sep 23 '13 12:09

Bernhard Barker


People also ask

Which pattern matches the preceding pattern at least n times but not more than m times?

The { n , m } quantifier matches the preceding element at least n times, but no more than m times, where n and m are integers. { n , m } is a greedy quantifier whose lazy equivalent is { n , m }? .

Which symbol is used to represent zero or one instance?

For instance, the pattern ou? r looks for o followed by zero or one u , and then r . Means “zero or more”, the same as {0,} . That is, the character may repeat any times or be absent.


2 Answers

IEEE-Standard 1003.1 says:

The behavior of multiple adjacent duplication symbols ( '*' and intervals) produces undefined results.

So every implementation can do as it pleases, just don't rely on anything specific...

like image 99
piet.t Avatar answered Sep 28 '22 05:09

piet.t


When I input your regex in RegexBuddy using the Java regex syntax, it displays following message

Quantifiers must be preceded by a token that can be repeated «{2}»

Changing the regex to explicitly use a grouping ^(\d{1}){2} solves that error and works as you expect.


I assume that the java regex engine simply neglects the error/expression and works with what has been compiled so far.

Edit

The reference to the IEEE-Standard in @piet.t's answer seems to support that assumption.

Edit 2 (kudos to @fncomp)

For completeness, one would typically use (?:)to avoid capturing the group. The complete regex then becomes ^(?:\d{1}){2}

like image 37
Lieven Keersmaekers Avatar answered Sep 28 '22 06:09

Lieven Keersmaekers