If I have a regex that is [0-Z]
or [a-Z]
- what characters would it match? Is it valid regex? Can you have ranges in regex outside of 0-9
, a-z
and A-Z
?
Special Regex Characters: These characters have special meaning in regex (to be discussed below): . , + , * , ? , ^ , $ , ( , ) , [ , ] , { , } , | , \ . Escape Sequences (\char): To match a character having special meaning in regex, you need to use a escape sequence prefix with a backslash ( \ ). E.g., \. matches "."
The regular expression [A-Z][a-z]* matches any sequence of letters that starts with an uppercase letter and is followed by zero or more lowercase letters. The special character * after the closing square bracket specifies to match zero or more occurrences of the character set.
The subexpression/metacharacter “\Z” matches the end of the entire string except allowable final line terminator.
In the context of regular expressions, a character class is a set of characters enclosed within square brackets. It specifies the characters that will successfully match a single character from a given input string.
Yes, you can have other ranges. From MSDN - Character Classes in Regular Expressions (bold is mine):
The syntax for specifying a range of characters is as follows:
[firstCharacter-lastCharacter]
where
firstCharacter
is the character that begins the range andlastCharacter
is the character that ends the range. A character range is a contiguous series of characters defined by specifying the first character in the series, a hyphen (-
), and then the last character in the series. Two characters are contiguous if they have adjacent Unicode code points.
So, in the end, [0-Z]
will match 0123456789:;<=>?ABCDEFGHIJKLMNOPQRSTUVWXYZ
. You can check the ASCII table for 0-Z
.
As for [a-Z]
, as they don't specify a contiguous series, they should match nothing.
Just keep in mind, for the general rule, the effect can be wide: Unicode character codes, not just ASCII - ultimately, of course, it depends on the implementation, so, if in doubt, check it.
The range [0-Z]
is valid, depending on the regex engine [a-Z]
will either be invalid or it will be a range that can't match any characters. In a character class range the start and end characters are just code points and all characters between those code points will be included in the range.
In the case of [0-Z]
, this is equivalent to the following more readable character class:
[0-9:;<=>?@A-Z]
In the case of [a-Z]
, this is actually a character class that won't match anything because a
has a higher code point than Z
.
You can see the code points in the following ASCII table from http://www.asciitable.com/:
Ranges depend on the character's (unicode) value. A range from [0-9] makes sense, but a range from [9-0] does not. Likewise, a range from [a-Z] will be empty because 'a' is greater than 'Z'. (All the uppercase letters come first, and there are intervening characters between 'Z' and 'a'). Rely on a table of character values (pull up charmap on Windows), and don't get fancy.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With