This question is related to this question: Hash keys encoding: Why do I get here with Devel::Peek::Dump two different results?
When I uncomment the # utf8::upgrade( $name );
line or comment out the $hash{'müller'} = 'magenta';
line it works.
#!/usr/bin/env perl
use warnings;
use 5.014;
use utf8;
binmode STDOUT, ':encoding(utf-8)';
use XML::LibXML;
# Hash read in from a file:
# ...
my %hash = ( 'müller' => 'green', 'schneider' => 'blue', 'bäcker' => 'red' );
# ...
# change or add something
$hash{'müller'} = 'magenta';
# writing Hash to xml file
my $doc = XML::LibXML::Document->new('1.0', 'UTF-8' );
my $root = $doc->createElement( 'my_test' );
for my $name ( keys %hash ) {
# utf8::upgrade( $name );
my $tag = $doc->createElement( 'item' );
$tag->setAttribute( 'name' => $name );
my $tag_color = $doc->createElement( 'color' );
$tag_color->appendTextNode( $hash{$name} );
$tag->appendChild( $tag_color );
$root->appendChild( $tag );
}
$doc->setDocumentElement($root);
say $doc->serialize( 1 );
$doc->toFile( 'my_test.xml', 1 );
Output:
error : string is not in UTF-8
encoding error : output conversion failed due to conv error, bytes 0xFC 0x6C 0x6C 0x65
I/O error : encoder error
<?xml version="1.0" encoding="ISO-8859-1"?>
<my_test>
<item name="m
i18n error : output conversion failed due to conv error, bytes 0xFC 0x6C 0x6C 0x65
I/O error : encoder error
According to XML::LibXML, whether 'müller' eq 'müller'
is true or false depends on how the strings have been stored internally. That's a bug. Specifically, assigning meaning to the UTF8 flag is known as "The Unicode Bug", and XML::LibXML is documented to do exactly that in the "encodings support" section of this page.
The bug is known, but it can't be fixed cleanly for backwards compatibility reasons. Perl provides two tools to work around instances of The Unicode Bug:
utf8::upgrade( $sv ); # Switch to the UTF8=1 storage format
utf8::downgrade( $sv ); # Switch to the UTF8=0 storage format
The former would be be the appropriate tool to use here.
sub _up { my ($s) = @_; utf8::ugprade($s); $s }
$tag_color->appendTextNode( _up $hash{$name} );
Note: You can use utf8::upgrade
even if you don't do use utf8;
. Only use use utf8;
if your source code is UTF-8.
I get the error if I save your script as iso-8859-1. If I save it as utf-8, it works.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With