Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What does a-z-A-Z mean in a regular expression?

Tags:

regex

go

I've been working with someone else's code and I ran across the regular expression [^0-9a-z-A-Z]. This bears close resemblance to the common [^0-9a-zA-Z] which is meant to exclude non-alphanumeric characters, but note the extra dash in the middle, between the lowercase z and uppercase A.

I'm not very familiar with regular expressions, but I've read several pages on them now, and none of the rules I've seen seem to cover what this syntax would mean. Perhaps it's not even valid syntax, but the Golang regex interpreter doesn't seem to mind. I'd appreciate any clarification. Thanks.

like image 665
ireardon Avatar asked May 09 '15 20:05

ireardon


People also ask

What is meaning of \w+ in regular expression?

\w+ matches 1 or more word characters (same as [a-zA-Z0-9_]+ ). [. -]? matches an optional character . or - . Although dot ( . ) has special meaning in regex, in a character class (square brackets) any characters except ^ , - , ] or \ is a literal, and do not require escape sequence.

What does 0 9a zA Z mean?

[0-9a-zA-Z. +_]+ means the regex is searching for any single character that is either a digit between 0-9, or a lower case letter between a and z, or an upper case letter between A and Z, or a period, or a plus sign, or an underscore.

Is AZ the same as zA?

[A-z] will match ASCII characters in the range from A to z , while [a-zA-Z] will match ASCII characters in the range from A to Z and in the range from a to z . At first glance, this might seem equivalent -- however, if you look at this table of ASCII characters, you'll see that A-z includes several other characters.


1 Answers

A dash in a character class in a place where it cannot be interpreted as a range is interpreted as a literal dash. So the expression excludes the characters 0 to 9, a to z, A to Z, and -. That's why there's no syntax error.

It's probably a typo though. If the dash is meant to be there, then to prevent confusion it should be escaped and/or moved out from between the ranges, such as [^0-9a-zA-Z\-]

like image 76
Boann Avatar answered Sep 21 '22 16:09

Boann