Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Difference between regex quantifiers plus and star

Tags:

regex

I try to extract the error number from strings like "Wrong parameters - Error 1356":

 Pattern p = Pattern.compile("(\\d*)");
 Matcher m = p.matcher(myString);
 m.find();
 System.out.println(m.group(1));

And this does not print anything, that became strange for me as the * means * - Matches the preceding element zero or more times from Wiki

I also went to the www.regexr.com and regex101.com and test it and the result was the same, nothing for this expression \d*

Then I start to test some different things (all tests made on the sites I mentioned):

  • (\d)* doesn't work
  • \d{0,} doesn't work
  • [\d]* doesn't work
  • [0-9]* doesn't work
  • \d{4} works
  • \d+ works
  • (\d+) works
  • [0-9]+ works

So, I start to search on the web if I could find an explanation for this. The best I could find was here on the Quantifier section, which states:

\d? Optional digit (one or none).
\d* Eat as many digits as possible (but none if necessary)
\d+ Eat as many digits as possible, but at least one.
\d*? Eat as few digits as necessary (possibly none) to return a match.
\d+? Eat as few digits as necessary (but at least one) to return a match.

The question

As english is not my primary language I'm having trouble to understand the difference (mainly the (but none if necessary) part). So could you Regex expert guys explain this in simple words please?

The closest thing that I find to this question here on SO was this one: Regex: possessive quantifier for the star repetition operator, i.e. \d** but here it is not explained the difference.

like image 712
Jorge Campos Avatar asked Feb 12 '23 15:02

Jorge Campos


2 Answers

The * quantifier matches zero or more occurences.

In practice, this means that

\d*

will match every possible input, including the empty string. So your regex matches at the start of the input string and returns the empty string.

like image 150
Frank Schmitt Avatar answered Feb 15 '23 11:02

Frank Schmitt


but none if necessary means that it will not break the regex pattern if there is no match. So \d* means it will match zero or more occurrences of digits.

For eg.

\d*[a-z]*

will match

abcdef

but \d+[a-z]*

will not match

abcdef

because \d+ implies that at least one digit is required.

like image 41
Wes Avatar answered Feb 15 '23 10:02

Wes