Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Printing UTF8 in Perl?

Tags:

unicode

perl

Why does adding the use utf8 pragma produce garbled output (see below) versus when I don't use this pragma

The code:

use strict;
use v5.10;
use Data::Dumper;
# if I comment this line out, then the results print fine
use utf8;

my $s = {
    'data' => 'The size is 200 μg'
};

say Dumper( $s );

Results without use utf8:

$VAR1 = {
          'data' => 'The size is 200 μg'
        };

Results WITH using use utf8:

$VAR1 = {
          'data' => "The size is 200 \x{3bc}g"
        };

Thanks for any insights

like image 525
Ricky Avatar asked Sep 12 '25 19:09

Ricky


1 Answers

It is not garbled, but a standard Data::Dumper escape by the default "Useqq" configuration option listed here. Data::Dumper is designed for debugging and so this option lets you see what exact characters are when they may not be printable.

Without use utf8;, your string actually contains the UTF-8 encoded bytes of that character rather than the character itself, since that is what the file contains. You can verify this by checking the length of the string. use utf8; causes the interpreter to decode the source code from UTF-8, including your literal string.

In order to print such characters, it needs to be encoded back to UTF-8 bytes. You can either do this directly:

use strict;
use warnings;
use utf8;
use Encode 'encode';
print encode 'UTF-8', 'The size is 200 μg';

Or you can set an encoding layer on STDOUT, so that all printed text will be encoded to UTF-8:

use strict;
use warnings;
use utf8;
binmode *STDOUT, ':encoding(UTF-8)';
print 'The size is 200 μg';

Encoding to UTF-8 for Data::Dumper debugging is generally unnecessary, because it will escape such characters for your view already.

like image 193
Grinnz Avatar answered Sep 14 '25 13:09

Grinnz