Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What do the non-printable characters in the Perl symbol table represent?

Tags:

perl

I just learned that in Perl, the symbol table for a given module is stored in a hash that matches the module name -- so, for example, the symbol table for the fictional module Foo::Bar would be %Foo::Bar. The default symbol table is stored in %main::. Just for the sake of curiosity, I decided that I wanted to see what was in %main::, so iterated through each key/value pair in the hash, printing them out as I went:

#! /usr/bin/perl

use v5.14;
use strict;
use warnings;

my $foo;
my $bar;
my %hash;

while( my ( $key, $value ) = each  %:: )  {
    say "Key: '$key' Value '$value'";
} 

The output looked like this:

Key: 'version::' Value '*main::version::'
Key: '/' Value '*main::/'
Key: '' Value '*main::'
Key: 'stderr' Value '*main::stderr'
Key: '_<perl.c' Value '*main::_<perl.c'
Key: ',' Value '*main::,'
Key: '2' Value '*main::2'
...

I was expecting to see the STDOUT and STDERR file handles, and perhaps @INC and %ENV... what I wasn't expecting to see was non-ascii characters ... what the code block above doesn't show is that the third line of the output actually had a glyph indicating a non-printable character.

I ran the script and piped it as follows:

perl /tmp/asdf.pl | grep '[^[:print:]]' | while read line
do 
    echo $line
    od -c <<< $line
    echo
done

The output looked like this:

Key: '' Value '*main::'
0000000   K   e   y   :       ' 026   '       V   a   l   u   e       '
0000020   *   m   a   i   n   :   : 026   '  \n
0000032

Key: 'ARNING_BITS' Value '*main::ARNING_BITS'
0000000   K   e   y   :       ' 027   A   R   N   I   N   G   _   B   I
0000020   T   S   '       V   a   l   u   e       '   *   m   a   i   n
0000040   :   : 027   A   R   N   I   N   G   _   B   I   T   S   '  \n
0000060

Key: '' Value '*main::'
0000000   K   e   y   :       ' 022   '       V   a   l   u   e       '
0000020   *   m   a   i   n   :   : 022   '  \n
0000032

Key: 'E_TRIE_MAXBUF' Value '*main::E_TRIE_MAXBUF'
0000000   K   e   y   :       ' 022   E   _   T   R   I   E   _   M   A
0000020   X   B   U   F   '       V   a   l   u   e       '   *   m   a
0000040   i   n   :   : 022   E   _   T   R   I   E   _   M   A   X   B
0000060   U   F   '  \n
0000064

Key: ' Value '*main:'
0000000   K   e   y   :       '  \b   '       V   a   l   u   e       '
0000020   *   m   a   i   n   :   :  \b   '  \n
0000032

Key: '' Value '*main::'
0000000   K   e   y   :       ' 030   '       V   a   l   u   e       '
0000020   *   m   a   i   n   :   : 030   '  \n
0000032

So what are non-printable characters doing in the Perl symbol table? What are they symbols for?

like image 213
Barton Chittenden Avatar asked Apr 10 '13 04:04

Barton Chittenden


People also ask

What do you mean by non-printing characters?

Non-printing characters or formatting marks are characters for content designing in word processors, which are not displayed at printing. It is also possible to customize their display on the monitor. The most common non-printable characters in word processors are pilcrow, space, non-breaking space, tab character etc.

Which characters are non printable?

Some of the most common non printable characters are carriage return, form feed, line feed, backspace, escape, horizontal tab and vertical tab. These might not have a visible shape but will have effects on the output.

Where are non-printing characters defined in the ASCII character sets?

Display. There are a number of techniques to display non-printing characters, which may be illustrated with the bell character in ASCII encoding: Code point: decimal 7, hexadecimal 0x07. An abbreviation, often three capital letters: BEL.

What is a non text character?

Non-printable characters are parts of a character set that do not represent a written symbol or part of the text within a document or code, but rather are there in the context of signal and control in character encoding.

What is \S [ \F] in Perl?

Whitespace \s [ \f ]: The character class \s will match a single character i.e. a whitespace. It will also match the 5 characters i.e. -horizontal tab, -the newline, \f-the form feed, -the carriage return, and the space. In Perl v5.18, a new character to be introduced which is matches the \cK – vertical tab .

How to use abbreviations in regular expressions in Perl?

To make the regular expressions more readable, Perl provides useful predefined abbreviations for common character classes as shown below: d matches a digit, from 0 to 9 [0-9] s matches a whitespace character, that is a space, tab, newline, carriage return, formfeed. [tnrf] w matches a “word” character (alphanumeric or _) [0-9a-zA-Z_].

What are the special character classes in Perl?

The Special Character Classes in Perl are as follows: Digit \d [0-9]: The \d is used to match any digit character and its equivalent to [0-9]. In the regex /\d/ will match a single digit. The \d is standardized to “digit”. The main advantage is that the user can easily write in shorter form and can easily read it.

How to match from elem0 to elem1000 in Perl?

If you want to match from elem0 to elem1000, you can use range operator (-) within the character classes, for examples: To make the regular expressions more readable, Perl provides useful predefined abbreviations for common character classes as shown below:


1 Answers

Guru is on the right track: specifically, the answer is to be found in perlvar, which says:

"Perl variable names may also be a sequence of digits or a single punctuation or control character. These names are all reserved for special uses by Perl; for example, the all-digits names are used to hold data captured by backreferences after a regular expression match. Perl has a special syntax for the single-control-character names: It understands ^X (caret X) to mean the control-X character. For example, the notation $^W (dollar-sign caret W) is the scalar variable whose name is the single character control-W. This is better than typing a literal control-W into your program.

Since Perl 5.6, Perl variable names may be alphanumeric strings that begin with control characters (or better yet, a caret). These variables must be written in the form ${^Foo}; the braces are not optional. ${^Foo} denotes the scalar variable whose name is a control-F followed by two o's. These variables are reserved for future special uses by Perl, except for the ones that begin with ^_ (control-underscore or caret-underscore). No control-character name that begins with ^_ will acquire a special meaning in any future version of Perl; such names may therefore be used safely in programs. $^_ itself, however, is reserved."

If you want to print those names in a readable way, you could add a line like this to your code:

$key = '^' . ($key ^ '@') if $key =~ /^[\0-\x1f]/;

If first character of $key is a control character, this will replace it with a caret followed by the corresponding letter (^A for control-A, ^B for control-B, etc.).

like image 178
Ilmari Karonen Avatar answered Sep 28 '22 10:09

Ilmari Karonen