Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to print a Perl character class?

I was in a code review this morning and came across a bit of code that was wrong, but I couldn't tell why.

$line =~ /^[1-C]/;

This line was suppose to evaluate to a hex character between 1 and C, but I assume this line does not do that. The question is not what does match, but what does this match? Can I print out all characters in a character class? Something like below?

say join(', ', [1-C]);

Alas,

# Examples:
say join(', ', 1..9);
say join(', ', 'A'..'C');
say join(', ', 1..'C');

# Output
Argument "C" isn't numeric in range (or flop) at X:\developers\PERL\Test.pl line 33.

1, 2, 3, 4, 5, 6, 7, 8, 9
A, B, C
like image 738
Eric Fossum Avatar asked Apr 30 '13 19:04

Eric Fossum


1 Answers

It matches every code point from U+0030 ("1") to U+0043 ("C").

The simple answer is to use

map chr, ord("1")..ord("C")

instead of

"1".."C"

as you can see in the following demonstration:

$ perl -Mcharnames=:full -E'
   say sprintf " %s  U+%05X %s", chr($_), $_, charnames::viacode($_)
      for ord("1")..ord("C");
'
 1  U+00031 DIGIT ONE
 2  U+00032 DIGIT TWO
 3  U+00033 DIGIT THREE
 4  U+00034 DIGIT FOUR
 5  U+00035 DIGIT FIVE
 6  U+00036 DIGIT SIX
 7  U+00037 DIGIT SEVEN
 8  U+00038 DIGIT EIGHT
 9  U+00039 DIGIT NINE
 :  U+0003A COLON
 ;  U+0003B SEMICOLON
 <  U+0003C LESS-THAN SIGN
 =  U+0003D EQUALS SIGN
 >  U+0003E GREATER-THAN SIGN
 ?  U+0003F QUESTION MARK
 @  U+00040 COMMERCIAL AT
 A  U+00041 LATIN CAPITAL LETTER A
 B  U+00042 LATIN CAPITAL LETTER B
 C  U+00043 LATIN CAPITAL LETTER C

If you have Unicode::Tussle installed, you can get the same output from the following shell command:

unichars -au '[1-C]'

You might be interested in wasting time browsing the Unicode code charts. (This particular range is covered by "Basic Latin (ASCII)".)

like image 158
ikegami Avatar answered Sep 25 '22 16:09

ikegami