The documentation for PerlIO says:
:encoding Use :encoding(ENCODING) either in open() or binmode() to install a layer that transparently does character set and encoding transformations, for example from Shift-JIS to Unicode. Note that under stdio an :encoding also enables :utf8 . See PerlIO::encoding for more information.
Here is a test script:
use feature qw(say);
use strict;
use warnings;
my $fn = 'test.txt';
for my $mode ('>', '>:encoding(utf8)' ) {
open( my $fh, $mode, $fn);
say join ' ', (PerlIO::get_layers($fh));
close $fh;
}
Output is:
unix perlio
unix perlio encoding(utf8) utf8
Why do I get the additional utf8
layer here?
For reasons that require knowledge of Perl internals.
When you store the number 4
in a scalar, it could be stored as a signed integer, an unsigned integer or a floating point number. You don't know which is used, and you don't have any reason to care which one is used. Perl will automatically convert as needed.
It's the same situation for strings. There are two storage formats for them. Your name is the perfect example. "Håkon Hægland" can be stored as
48.E5.6B.6F.6E.20.48.E6.67.6C.61.6E.64
or as
48.C3.A5.6B.6F.6E.20.48.C3.A6.67.6C.61.6E.64
A flag called UTF8
indicates the choice of storage format. This is transparent to the user (or at least should be).
$ perl -Mutf8 -E'
$_ = "Håkon Hægland";
utf8::downgrade( $d = $_ ); # Converts to the first format mentioned above.
utf8::upgrade( $u = $_ ); # Converts to the second format mentioned above.
say $d eq $u ? "eq" : "ne";
'
eq
While it's transparent to you, it's far from transparent to Perl itself. Whenever you manipulate a string, Perl has to check in which storage format it's stored. For example, if you concatenate two strings, Perl has to make sure they use the same storage format before performing the concatenation, converting one if necessary.
It's also not transparent to PerlIO. PerlIO, like the rest of Perl, has to deal with the bytes in the string buffer rather than what you see at the Perl level. Sometimes, those bytes are destined to be the string buffer of scalars with the UTF8
flag cleared, and sometimes, those bytes are destined to be the string buffer of scalars with the UTF8
flag set. PerlIO needs to track that. Rather than carrying a flag along from layer to layer, PerlIO adds a :utf8
layer when the scalars obtained by reading from the handle need to have the UTF8
flag set.
So, :encoding
converts the bytes that form
Håkon Hægland
from the specified encoding to
48.C3.A5.6B.6F.6E.20.48.C3.A6.67.6C.61.6E.64
And :utf8
causes the scalar to have the UTF8
flag set, causing the resulting scalar to contain
U+0048.00E5.006B.006F.006E.0020.0048.00E6.0067.006C.0061.006E.0064
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With