I need to read a file encoded in iso-8859-1.
For some reason I can't get the encoding layer (as described in PerlIO::encoding
) to work. Here's a minimal example of what I am doing.
test.txt
contains a single pound sign encoded in iso-8859-1.
% iconv -f iso-8859-1 test.txt
£
% hexdump -C test.txt
00000000 a3 0a |..|
00000002
My Perl script:
#!/bin/perl
use warnings;
use strict;
open my $f, "<:encoding(iso-8859-1)", $ARGV[0] or die qq{Could not open $ARGV[0]: $!};
while (<$f>) {
print;
}
Result:
% ./script.pl test.txt | hexdump -C
00000000 a3 0a |..|
00000002
So the script prints the exact byte sequence it reads, with no conversion performed.
I was assuming that file handles not declared with a specific encoding use the utf-8 encoding by default, but apparently that isn't true.
Adding an explicit
binmode(STDOUT, ":utf8");
fixes the problem.
A string is a sequence of (32-bit or 64-bit) numbers.
In a string containing decoded text, those numbers are Unicode Code Points. Since byte A3
represents Unicode Code Point U+00A3
under iso-8859-1, decode("iso-8859-1", "\xA3")
therefore returns "\xA3"
.
You proceeded to print that string, and print("\xA3")
on a file handle with no encoding layers produces the byte A3
(since it expects a strings of bytes).
You didn't specify what you wanted to do, but I'm guessing you wanted the program to produce convert the input from iso-8859-1 to UTF-8. To achieve that,
Add
use open ':std', ':encoding(locale)';
or
use open ':std', ':encoding(UTF-8)';
These add an encoding layer to STDIN, STDOUT and STDERR (using binmode
), and they set the default encoding layer for open
in scope.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With