Say, I have a sub that receives two arguments: An encoding specification, and a file path. The sub then uses that information to open a file for reading as shown below, stripped down to its essentials:
run({
encoding => 'UTF-16---LE',
input_filename => 'test_file.txt',
});
sub run {
my $args = shift;
my ($enc, $fn) = @{ $args }{qw(encoding input_filename)};
my $is_ok = open my $in,
sprintf('<:encoding(%s)', $args->{encoding}),
$args->{input_filename}
;
}
Now, this croaks with:
Cannot find encoding "UTF-16---LE" at E:\Home\...
What is the right way to ensure that $args->{encoding}
holds a valid encoding specification before interpolating into the second argument to open
?
The information below is provided in the hope that it will be useful to someone at some point. I am also going to file a bug report.
The documents for Encode::Alias do not mention find_alias
at all. A casual look at the Encode/Alias.pm
on my Windows system reveals:
# Public, encouraged API is exported by default
our @EXPORT =
qw (
define_alias
find_alias
);
However, note:
#!/usr/bin/env perl
use 5.014;
use Encode::Alias;
say find_alias('UTF-8')->name;
yields:
Use of uninitialized value $find in exists at C:/opt/Perl/lib/Encode/Alias.pm line 25.
Use of uninitialized value $find in hash element at C:/opt/Perl/lib/Encode/Alias.pm line 26.
Use of uninitialized value $find in pattern match (m//) at C:/opt/Perl/lib/Encode/Alias.pm line 31.
Use of uninitialized value $find in lc at C:/opt/Perl/lib/Encode/Alias.pm line 40.
Use of uninitialized value $find in pattern match (m//) at C:/opt/Perl/lib/Encode/Alias.pm line 31.
Use of uninitialized value $find in lc at C:/opt/Perl/lib/Encode/Alias.pm line 40.
Being 1) lazy, and 2) first to assume I am doing something wrong, I decided to seek others' wisdom.
In any case, the bug is due to find_alias
being exported as a function without checking for that in the code:
sub find_alias {
require Encode;
my $class = shift;
my $find = shift;
unless ( exists $Alias{$find} ) {
If find_alias
is not invoked as a method, the argument is now in $class
and $find
is undefined.
HTH.
$octets = encode(ENCODING, $string [, CHECK]) Encodes a string from Perl's internal form into ENCODING and returns a sequence of octets. ENCODING can be either a canonical name or an alias. For encoding names and aliases, see Defining Aliases. For CHECK, see Handling Malformed Data.
String objects use UTF-16 encoding. The problem with UTF-16 is that it cannot be modified. There is only one way that can be used to get different encoding i.e. byte[] array.
You do that by calling str. valid_encoding? on a String str that is in UTF-8 -encoding. Does that not get clear from my answer? Programmatically, you can not (or at least not easily and of course not reliably) check the invalidity of a string in a one-byte-encoding such as CP1252 .
UTF-8 encodes a character into a binary string of one, two, three, or four bytes. UTF-16 encodes a Unicode character into a string of either two or four bytes. This distinction is evident from their names.
Encode::Alias->find_alias($encoding_name)
returns an object whose name
attribute is the canonical encoding name on success, and false on failure.
$ Encode::Alias->find_alias('UTF-16---LE')
$ Encode::Alias->find_alias('UTF-16 LE')
Encode::Unicode {
Parents Encode::Encoding
Linear @ISA Encode::Unicode, Encode::Encoding
public methods (6) : bootstrap, decode, decode_xs, encode, encode_xs, renew
private methods (0)
internals: {
endian "v",
Name "UTF-16LE",
size 2,
ucs2 ""
}
}
$ Encode::Alias->find_alias('Latin9')
Encode::XS {
public methods (9) : cat_decode, decode, encode, mime_name, name, needs_lines, perlio_ok, renew, renewed
private methods (0)
internals: 140076283926592
}
$ Encode::Alias->find_alias('UTF-16 LE')->name
UTF-16LE
$ Encode::Alias->find_alias('Latin9')->name
iso-8859-15
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With