Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What characters would be included in regex range a-Z? [duplicate]

Tags:

regex

If I have a regex that is [0-Z] or [a-Z] - what characters would it match? Is it valid regex? Can you have ranges in regex outside of 0-9, a-z and A-Z?

like image 699
Billy Moon Avatar asked Oct 16 '13 17:10

Billy Moon


People also ask

What characters are allowed in regex?

Special Regex Characters: These characters have special meaning in regex (to be discussed below): . , + , * , ? , ^ , $ , ( , ) , [ , ] , { , } , | , \ . Escape Sequences (\char): To match a character having special meaning in regex, you need to use a escape sequence prefix with a backslash ( \ ). E.g., \. matches "."

What does AZ do in regex?

The regular expression [A-Z][a-z]* matches any sequence of letters that starts with an uppercase letter and is followed by zero or more lowercase letters. The special character * after the closing square bracket specifies to match zero or more occurrences of the character set.

What does Z mean in regex?

The subexpression/metacharacter “\Z” matches the end of the entire string except allowable final line terminator.

What are character classes in regex?

In the context of regular expressions, a character class is a set of characters enclosed within square brackets. It specifies the characters that will successfully match a single character from a given input string.


3 Answers

Yes, you can have other ranges. From MSDN - Character Classes in Regular Expressions (bold is mine):

The syntax for specifying a range of characters is as follows:

[firstCharacter-lastCharacter]

where firstCharacter is the character that begins the range and lastCharacter is the character that ends the range. A character range is a contiguous series of characters defined by specifying the first character in the series, a hyphen (-), and then the last character in the series. Two characters are contiguous if they have adjacent Unicode code points.

So, in the end, [0-Z] will match 0123456789:;<=>?ABCDEFGHIJKLMNOPQRSTUVWXYZ. You can check the ASCII table for 0-Z.

As for [a-Z], as they don't specify a contiguous series, they should match nothing.

Just keep in mind, for the general rule, the effect can be wide: Unicode character codes, not just ASCII - ultimately, of course, it depends on the implementation, so, if in doubt, check it.

like image 93
acdcjunior Avatar answered Nov 15 '22 05:11

acdcjunior


The range [0-Z] is valid, depending on the regex engine [a-Z] will either be invalid or it will be a range that can't match any characters. In a character class range the start and end characters are just code points and all characters between those code points will be included in the range.

In the case of [0-Z], this is equivalent to the following more readable character class:

[0-9:;<=>?@A-Z]

In the case of [a-Z], this is actually a character class that won't match anything because a has a higher code point than Z.

You can see the code points in the following ASCII table from http://www.asciitable.com/:

enter image description here

like image 36
Andrew Clark Avatar answered Nov 15 '22 05:11

Andrew Clark


Ranges depend on the character's (unicode) value. A range from [0-9] makes sense, but a range from [9-0] does not. Likewise, a range from [a-Z] will be empty because 'a' is greater than 'Z'. (All the uppercase letters come first, and there are intervening characters between 'Z' and 'a'). Rely on a table of character values (pull up charmap on Windows), and don't get fancy.

like image 33
fred02138 Avatar answered Nov 15 '22 03:11

fred02138