I've been working with someone else's code and I ran across the regular expression [^0-9a-z-A-Z]
. This bears close resemblance to the common [^0-9a-zA-Z]
which is meant to exclude non-alphanumeric characters, but note the extra dash in the middle, between the lowercase z
and uppercase A
.
I'm not very familiar with regular expressions, but I've read several pages on them now, and none of the rules I've seen seem to cover what this syntax would mean. Perhaps it's not even valid syntax, but the Golang regex interpreter doesn't seem to mind. I'd appreciate any clarification. Thanks.
\w+ matches 1 or more word characters (same as [a-zA-Z0-9_]+ ). [. -]? matches an optional character . or - . Although dot ( . ) has special meaning in regex, in a character class (square brackets) any characters except ^ , - , ] or \ is a literal, and do not require escape sequence.
[0-9a-zA-Z. +_]+ means the regex is searching for any single character that is either a digit between 0-9, or a lower case letter between a and z, or an upper case letter between A and Z, or a period, or a plus sign, or an underscore.
[A-z] will match ASCII characters in the range from A to z , while [a-zA-Z] will match ASCII characters in the range from A to Z and in the range from a to z . At first glance, this might seem equivalent -- however, if you look at this table of ASCII characters, you'll see that A-z includes several other characters.
A dash in a character class in a place where it cannot be interpreted as a range is interpreted as a literal dash. So the expression excludes the characters 0
to 9
, a
to z
, A
to Z
, and -
. That's why there's no syntax error.
It's probably a typo though. If the dash is meant to be there, then to prevent confusion it should be escaped and/or moved out from between the ranges, such as [^0-9a-zA-Z\-]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With