Why does adding the use utf8
pragma produce garbled output (see below) versus when I don't use this pragma
The code:
use strict;
use v5.10;
use Data::Dumper;
# if I comment this line out, then the results print fine
use utf8;
my $s = {
'data' => 'The size is 200 μg'
};
say Dumper( $s );
Results without use utf8
:
$VAR1 = {
'data' => 'The size is 200 μg'
};
Results WITH using use utf8
:
$VAR1 = {
'data' => "The size is 200 \x{3bc}g"
};
Thanks for any insights
It is not garbled, but a standard Data::Dumper escape by the default "Useqq" configuration option listed here. Data::Dumper is designed for debugging and so this option lets you see what exact characters are when they may not be printable.
Without use utf8;
, your string actually contains the UTF-8 encoded bytes of that character rather than the character itself, since that is what the file contains. You can verify this by checking the length of the string. use utf8;
causes the interpreter to decode the source code from UTF-8, including your literal string.
In order to print such characters, it needs to be encoded back to UTF-8 bytes. You can either do this directly:
use strict;
use warnings;
use utf8;
use Encode 'encode';
print encode 'UTF-8', 'The size is 200 μg';
Or you can set an encoding layer on STDOUT, so that all printed text will be encoded to UTF-8:
use strict;
use warnings;
use utf8;
binmode *STDOUT, ':encoding(UTF-8)';
print 'The size is 200 μg';
Encoding to UTF-8 for Data::Dumper debugging is generally unnecessary, because it will escape such characters for your view already.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With