Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is this regex allowing a caret?

Tags:

regex

http://regexr.com/3ars8

^(?=.*[0-9])(?=.*[A-z])[0-9A-z-]{17}$ 

Should match "17 alphanumeric chars, hyphens allowed too, must include at least one letter and at least one number"

It'll correctly match:

ABCDF31U100027743 

and correctly decline to match:

AB$DF31U100027743 

(and almost any other non-alphanumeric char)

but will apparently allow:

AB^DF31U100027743 
like image 341
Wintermute Avatar asked Apr 21 '15 12:04

Wintermute


People also ask

What does caret in regex mean?

You can use the caret symbol (^) at the start of a regular expression to indicate that a match must occur at the beginning of the searched text. If we apply the following regular expression ^a (if a is the starting symbol) to input string abc, it matches a.

What does a carrot mean in regex?

These are called anchor characters: If a caret ( ^ ) is at the beginning of the entire regular expression, it matches the beginning of a line. If a dollar sign ( $ ) is at the end of the entire regular expression, it matches the end of a line.

What is caret in Python regex?

To negate a set or a range, you use the caret character ( ^ ) at the beginning of the set and range. For example, the range [^0-9] matches any character except a digit. It is the same as the character set \D . Notice that regex also uses the caret ( ^ ) as an anchor that matches at the beginning of a string.

What does \+ mean in regex?

In posix-ere and other regex flavors, outside a character class ( [...] ), + acts as a quantifier meaning "one or more, but as many as possible, occurrences of the quantified pattern*. E.g. in javascript, s. replace(/\++/g, '-') will replace a string like ++++ with a single - .


2 Answers

Because your character class [A-z] matches this symbol.

[A-z] matches [, \, ], ^, _, `, and the English letters.

Actually, it is a common mistake. You should use [a-zA-Z] instead to only allow English letters.

Here is a visualization from Expresso, showing what the range [A-z] actually covers:

screenshot from Expresso showing the ASCII table, where you can see what the [A-z] range actually covers

So, this regex (with i option) won't capture your string.

^(?=.*[0-9])(?=.*[a-z])[0-9a-z-]{17}$ 

In my opinion, it is always safer to use Ignorecase option to avoid such an issue and shorten the regex.

like image 88
Wiktor Stribiżew Avatar answered Oct 13 '22 08:10

Wiktor Stribiżew


regex uses ASCII printable characters from the space to the tilde range.

Whenever we use [A-z] token it matches the following table highlighted characters. If we use [ -~] token it matches starting from SPACE to tilde.

enter image description here

like image 23
Premraj Avatar answered Oct 13 '22 08:10

Premraj