Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's difference between regex{m,n} and (regex){m,n}?

Tags:

java

regex

I'm developing a docker project, need to write a regex to check repository name. Requirement as follow:

  1. only include ASCII charactors, exclude upcase.
  2. special charactors exclude except for dot(.), hyphen(-) and underline(_).
  3. only start with alphabet and number and also end with it.
  4. special charactors can't appear continuously.
  5. length limit(min:2, max: 255)

then, my regex is:

([a-z0-9]+(?:[._-][a-z0-9]+)*){2,255}

but, it can't be OK, when repository name is e-e_1.1

When I change it to:

[a-z0-9]+(?:[._-][a-z0-9]+)*{2,255}

it's OK.

Is there someone can explain? Thank you in advance.

like image 815
xautjzd Avatar asked Feb 29 '16 08:02

xautjzd


People also ask

Are there different versions of regex?

As a result, broadly speaking, there are three types of regex engines: DFA (POSIX or not—similar either way) Traditional NFA (most common: Perl, . NET, PHP, Java, Python, . . . )

What are the different regex types?

There are also two types of regular expressions: the "Basic" regular expression, and the "extended" regular expression.

What does M do in regex?

The m flag is used to specify that a multiline input string should be treated as multiple lines. If the m flag is used, ^ and $ match at the start or end of any line within the input string instead of the start or end of the entire string.

What does N mean in regex?

\n. Matches a newline character. \r. Matches a carriage return character. \s.


1 Answers

In the ([a-z0-9]+(?:[._-][a-z0-9]+)*){2,255} regex, the limiting quantifier {2,255} is applied to the whole pattern inside Group 1 ([a-z0-9]+(?:[._-][a-z0-9]+)*). It means it can be repeated 2 to 255 times. It does not mean the whole string length is restricted to 2 to 255 characters.

Now, your [a-z0-9]+(?:[._-][a-z0-9]+)*{2,255} regex can match unlimited characters, too, because the string matched with [a-z0-9]+ can have 1 or more characters. (?:[._-][a-z0-9]+)* can match zero or more characters. The limiting quantifier {2,255} does not work here at all the way you need.

To restrict the length of the input string to 2 to 255 characters, you will have to use a lookahead anchored at the start:

^(?=.{2,255}$)[a-z0-9]+(?:[._-][a-z0-9]+)*$
 ^^^^^^^^^^^^^

The (?=.{2,255}$) lookahead will be executed only once at the beginning of the string and a match will only be found if the condition inside the lookahead is met: there must be 2 to 255 characters (. matches any characters other than a newline, but it is not important as you only allow specific characters in the matching pattern later) up to the end of the string.

like image 116
Wiktor Stribiżew Avatar answered Oct 13 '22 01:10

Wiktor Stribiżew