If I have a regex that is <code>[0-Z]</code> or <code>[a-Z]</code> - what characters would it match? Is it valid regex? Can you have ranges in regex outside of <code>0-9</code>, <code>a-z</code> and <code>A-Z</code>?

Yes, you can have other ranges. From MSDN - Character Classes in Regular Expressions (bold is mine): <blockquote> The syntax for specifying a range of characters is as follows: </blockquote> <pre class="prettyprint"><code>[firstCharacter-lastCharacter] </code></pre> <blockquote> where <code>firstCharacter</code> is the character that begins the range and <code>lastCharacter</code> is the character that ends the range. A character range is a contiguous series of characters defined by specifying the first character in the series, a hyphen (<code>-</code>), and then the last character in the series. Two characters are contiguous if they have adjacent Unicode code points. </blockquote> So, in the end, <code>[0-Z]</code> will match <code>0123456789:;<=>?ABCDEFGHIJKLMNOPQRSTUVWXYZ</code>. You can check the ASCII table for <code>0-Z</code>. As for <code>[a-Z]</code>, as they don't specify a contiguous series, they should match nothing. Just keep in mind, for the general rule, the effect can be wide: Unicode character codes, not just ASCII - ultimately, of course, it depends on the implementation, so, if in doubt, check it.

The range <code>[0-Z]</code> is valid, depending on the regex engine <code>[a-Z]</code> will either be invalid or it will be a range that can't match any characters. In a character class range the start and end characters are just code points and all characters between those code points will be included in the range. In the case of <code>[0-Z]</code>, this is equivalent to the following more readable character class: <pre class="prettyprint lang-none prettyprint-override"><code>[0-9:;<=>?@A-Z] </code></pre> In the case of <code>[a-Z]</code>, this is actually a character class that won't match anything because <code>a</code> has a higher code point than <code>Z</code>. You can see the code points in the following ASCII table from http://www.asciitable.com/: <img src="https://i.stack.imgur.com/bEcLC.gif" alt="enter image description here">

What characters would be included in regex range a-Z? [duplicate]

3 Answers

Yes, you can have other ranges. From MSDN - Character Classes in Regular Expressions (bold is mine):

The syntax for specifying a range of characters is as follows:

[firstCharacter-lastCharacter]

where firstCharacter is the character that begins the range and lastCharacter is the character that ends the range. A character range is a contiguous series of characters defined by specifying the first character in the series, a hyphen (-), and then the last character in the series. Two characters are contiguous if they have adjacent Unicode code points.

So, in the end, [0-Z] will match 0123456789:;<=>?ABCDEFGHIJKLMNOPQRSTUVWXYZ. You can check the ASCII table for 0-Z.

As for [a-Z], as they don't specify a contiguous series, they should match nothing.

Just keep in mind, for the general rule, the effect can be wide: Unicode character codes, not just ASCII - ultimately, of course, it depends on the implementation, so, if in doubt, check it.

answered Nov 15 '22 05:11

acdcjunior

The range [0-Z] is valid, depending on the regex engine [a-Z] will either be invalid or it will be a range that can't match any characters. In a character class range the start and end characters are just code points and all characters between those code points will be included in the range.

In the case of [0-Z], this is equivalent to the following more readable character class:

[0-9:;<=>?@A-Z]

In the case of [a-Z], this is actually a character class that won't match anything because a has a higher code point than Z.

You can see the code points in the following ASCII table from http://www.asciitable.com/:

enter image description here

answered Nov 15 '22 05:11

Andrew Clark

Ranges depend on the character's (unicode) value. A range from [0-9] makes sense, but a range from [9-0] does not. Likewise, a range from [a-Z] will be empty because 'a' is greater than 'Z'. (All the uppercase letters come first, and there are intervening characters between 'Z' and 'a'). Rely on a table of character values (pull up charmap on Windows), and don't get fancy.

answered Nov 15 '22 03:11

fred02138

Related questions
                            
                                How to remove HTML markup from a body of text within a Google Spreadsheet?
                            
                                Java regular expression to validate numeric comma separated values
                            
                                Different MAC Addresses Regex
                            
                                Replace/delete special characters within matched strings in sed
                            
                                Tidy up a string
                            
                                PHP: How to keep line-breaks using nl2br() with HTML Purifier?
                            
                                sed - Include newline in pattern
                            
                                Python tokenize sentence with optional key/val pairs
                            
                                Check if a string is a valid RegEx Pattern VB.NET
                            
                                Why does the order of alternatives matter in regex?
                            
                                Find all lines with a length greater than N
                            
                                regex - confused about lookaround functionality
                            
                                Can you explain why \G in my Perl regex pattern behaves this way?
                            
                                Extracting string between quotes split across multiple lines in Python
                            
                                Extract using sed or grep
                            
                                C++ can't find regex even with -std=c++11 macOSX
                            
                                correct usage of carets inside negative lookahead expression in perl
                            
                                DataAnnotaion fails(freeze) on client?
                            
                                jquery replace square brackets
                            
                                Linux tools - how to count and list occurrences of regex in file

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What characters would be included in regex range a-Z? [duplicate]

Tags:

regex

Billy Moon

People also ask

3 Answers

acdcjunior

Andrew Clark

fred02138

Recent Activity

Donate For Us