Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why doesn't Perl's encoding layer have any effect?

I need to read a file encoded in iso-8859-1.

For some reason I can't get the encoding layer (as described in PerlIO::encoding) to work. Here's a minimal example of what I am doing.

test.txt contains a single pound sign encoded in iso-8859-1.

% iconv -f iso-8859-1 test.txt
£

% hexdump -C test.txt
00000000  a3 0a                                             |..|
00000002

My Perl script:

#!/bin/perl

use warnings;
use strict;

open my $f, "<:encoding(iso-8859-1)", $ARGV[0] or die qq{Could not open $ARGV[0]: $!};

while (<$f>) {
  print;
}

Result:

% ./script.pl test.txt | hexdump -C
00000000  a3 0a                                             |..|
00000002

So the script prints the exact byte sequence it reads, with no conversion performed.

like image 473
Roman Cheplyaka Avatar asked Jan 02 '23 17:01

Roman Cheplyaka


2 Answers

I was assuming that file handles not declared with a specific encoding use the utf-8 encoding by default, but apparently that isn't true.

Adding an explicit

binmode(STDOUT, ":utf8");

fixes the problem.

like image 200
Roman Cheplyaka Avatar answered Jan 11 '23 22:01

Roman Cheplyaka


A string is a sequence of (32-bit or 64-bit) numbers.

In a string containing decoded text, those numbers are Unicode Code Points. Since byte A3 represents Unicode Code Point U+00A3 under iso-8859-1, decode("iso-8859-1", "\xA3") therefore returns "\xA3".

You proceeded to print that string, and print("\xA3") on a file handle with no encoding layers produces the byte A3 (since it expects a strings of bytes).


You didn't specify what you wanted to do, but I'm guessing you wanted the program to produce convert the input from iso-8859-1 to UTF-8. To achieve that,

Add

use open ':std', ':encoding(locale)';

or

use open ':std', ':encoding(UTF-8)';

These add an encoding layer to STDIN, STDOUT and STDERR (using binmode), and they set the default encoding layer for open in scope.

like image 41
ikegami Avatar answered Jan 11 '23 21:01

ikegami