I use this perl code to read XML from a file, and then write to another file (my full script has code to add attributes):
#!usr/bin/perl -w
use strict;
use XML::DOM;
use XML::Simple;
my $num_args = $#ARGV + 1;
if ($num_args != 2) {
print "\nUsage: ModifyXML.pl inputXML outputXML\n";
exit;
}
my $inputPath = $ARGV[0];
my $outputPath = $ARGV[1];
open(inputXML, "$inputPath") || die "Cannot open $inputPath \n";
my $parser = XML::DOM::Parser->new();
my $data = $parser->parsefile($inputPath) || die "Error parsing XML File";
open my $fh, '>:utf8', "$outputPath" or die "Can't open $outputPath for writing: $!\n";
$data->printToFileHandle($fh);
close(inputXML);
however this doesn't preserve characters like line breaks. For example, this XML:
<?xml version="1.0" encoding="utf-8"?>
<Test>
<Notification Content="test1 testx 
test2
test3
" Type="Test1234">
</Notification>
</Test>
becomes this:
<?xml version="1.0" encoding="utf-8"?>
<Test>
<Notification Content="test1 testx
test2
test3
" Type="Test1234">
</Notification>
</Test>
I suspect I'm not writing to file properly.
Use XML::LibXML, for example. The main modules that get involved are XML::LibXML::Parser and XML::LibXML::DOM (along with others). The returned object is generally XML::LibXML::Document
use warnings 'all';
use strict;
use XML::LibXML;
my $inputPath = 'with_encodings.xml';
my $outputPath = 'keep_encodings.xml';
my $reader = XML::LibXML->new();
my $doc = $reader->load_xml(location => $inputPath, no_blanks => 1);
print $doc->toString();
my $state = $doc->toFile($outputPath);
We don't have to first create an object but can directly say XML::LibXML->load_xml
. I do it as an example since this way one can then use methods on $reader
to set up encodings (for example), before parsing but outside of the constructor.
This module is also far more convenient for processing.
The XML::Twig should also leave encodings, and is also far better for processing.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With