I have a string $data, encoded in utf-8. I assume that I don't know whether this string is utf-8 or iso-8859-1. I want to use the Perl Encode::Guess module to see if it's one or the other. I'm having trouble figuring out how this module works.
I have tried the four following methods (from http://perldoc.perl.org/Encode/Guess.html) :
use Encode::Guess qw/utf8 latin1/;
my $decoder = guess_encoding($data);
print "$decoder\n";
Result: iso-8859-1 or utf8
use Encode::Guess qw/utf8 latin1/;
my $enc = guess_encoding($data, qw/utf8 latin1/);
ref($enc) or die "Can't guess: $enc";
my $utf8 = $enc->decode($data);
print "$utf8\n";
Result: Can't guess: iso-8859-1 or utf8 at encodage-windows.pl line 25, line 18110.
use Encode::Guess qw/utf8 latin1/;
my $decoder = Encode::Guess->guess($data);
die $decoder unless ref($decoder);
my $utf8 = $decoder->decode($data);
print "$utf8\n";
Result: iso-8859-1 or utf8 at encodage-windows.pl line 30, line 18110.
use Encode::Guess qw/utf8 latin1/;
my $utf8 = Encode::decode("Guess", $data);
print "$utf8\n";
Result: iso-8859-1 or utf8 at /usr/local/lib/perl5/Encode.pm line 175.
My first question is: which one of these methods am I supposed to use (if any)? And my second question: what changes should I make to make this work?
I normally check the possible encodings one at a time, like this
my $decoder = guess_encoding($data, 'utf8');
$decoder = guess_encoding($data, 'iso-8859-1') unless ref $decoder;
die $decoder unless ref $decoder;
printf "Decoding as %s\n\n", $decoder->name;
$data = $decoder->decode($data);
If possible it chooses UTF-8, otherwise it tries ISO-8859-1, and either chooses that or errors, so it becomes a simple yes/no result for each encoding and there is no way for it to come up with two possible results (which is the error you're getting).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With