I am trying to use XML::RAI perl module on UTF8 coded text and I still have error I don't really understand... here is the code (it shouldn't do anything useful yet):
use HTTP::Request;
use LWP::UserAgent;
use XML::RAI;
use Encode;
my $ua = LWP::UserAgent->new;
sub readFromWeb{
my $address = shift;
my $request = HTTP::Request->new( GET => $address );
my $response = $ua->request( $request );
return unless $response->code == 200;
return decode("utf8", $response->content());
}
sub readFromRSS{
my $address=shift;
my $content = readFromWeb $address;
my $rai = XML::RAI->parse_string($content);
#this line "causes" the error
}
readFromRSS("http://aktualne.centrum.cz/export/rss-hp.phtml");
#I am testing it on this particular RSS
the error is:
Cannot decode string with wide characters at /usr/lib/perl5/5.8.8/i686-linux/Encode.pm line 166.
I don't have a clue if that's my fault or the fault of XML::RAI. I don't see where these wide characters can be, if $content is already decoded from utf8...
edit: for some reason I still don't understand, removing the "decode" part actually solved the problem.
The problem is double-decoding. XML::RAI::parse_string()
apparently
expects an UTF-8 encoded document and does the decoding itself. If you
pass in a string that is already decoded, decoding it a second time will fail,
of course:
#!/usr/bin/perl
use strict;
use warnings;
use Encode qw( decode );
use LWP::Simple qw( get );
my $xml = get("http://aktualne.centrum.cz/export/rss-hp.phtml");
$xml = decode('UTF-8', $xml);
$xml = decode('UTF-8', $xml); # dies: Cannot decode string with wide characters ...
So just skip the decode()
step and you'll be fine.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With