For the life of me I can't figure out the proper code to access the comment lines in my XML file. Do I use findnodes
, find
, getElementByTagName
(doubt it).
Am I even making the correct assumption that these comment lines are accessible? I would hope so, as I know I can add a comment.
The type number for a comment node is 8, so they must be parseable.
Ultimately, what I want tot do is delete them.
my @nodes = $dom->findnodes("//*");
foreach my $node (@nodes) {
print $node->nodeType, "\n";
}
<TT>
<A>xyz</A>
<!-- my comment -->
</TT>
If all you need to do is produce a copy of the XML with comment nodes removed, then the first parameter of toStringC14N
is a flag that says whether you want comments in the output. Omitting all parameters implicitly sets the first to a false value, so
$doc->toStringC14N
will reproduce the XML trimmed of comments. Note that the Canonical XML form specified by C14N doesn't include an XML declaration header. It is always XML 1.0 encoded in UTF-8.
findnodes
with the XPath expression //comment()
will locate them for you, and unbindNode
will remove them from the XML.This program demonstrates
use strict;
use warnings;
use XML::LibXML;
my $doc = XML::LibXML->load_xml(string => <<END_XML);
<TT>
<A>xyz</A>
<!-- my comment -->
</TT>
END_XML
# Print everything
print $doc->toString, "\n";
# Print without comments
print $doc->toStringC14N, "\n\n";
# Remove comments and print everything
$_->unbindNode for $doc->findnodes('//comment()');
print $doc->toString;
output
<?xml version="1.0"?>
<TT>
<A>xyz</A>
<!-- my comment -->
</TT>
<TT>
<A>xyz</A>
</TT>
<?xml version="1.0"?>
<TT>
<A>xyz</A>
</TT>
Update
To select a specific comment, you can add a predicate expression to the XPath selector. To find the specific comment in your example data you could write
$doc->findnodes('//comment()[. = " my comment "]')
Note that the text of the comment includes everything except the leading and trailing --
, so spaces are significant as shown in that call.
If you want to make things a bit more lax, you could use normalize=space
, which removes leading and trailing whitespace, and contracts every sequence of whitespace within the string to a single space. Now you can write
$doc->findnodes('//comment()[normalize-space(.) = "my comment"]')
And the same call would find your comment even if it looked like this.
<!--
my
comment
-->
Finally, you can make use of contains
, which, as you would expect, simply checks whether one string contains another. Using that you could write
$doc->findnodes('//comment()[contains(., "comm")]')
The one to choose depends on your requirement and your situation.
According to the XPath spec:
*
is a test that matches element nodes of any name. Comment nodes aren't element nodes.
comment()
is a test that matches comment nodes.
Untested:
for $comment_node ($doc->findnodes('//comment()')) {
$comment_node->parentNode->removeChild($comment_node);
}
I know it's not XML::LibXML
but here you have another way to remove comments easily with XML::Twig
module:
#!/usr/bin/env perl
use warnings;
use strict;
use XML::Twig;
my $twig = XML::Twig->new(
pretty_print => 'indented',
comments => 'drop'
)->parsefile( shift )->print;
Run it like:
perl script.pl xmlfile
That yields:
<TT>
<A>xyz</A>
</TT>
The comments
option has also the value process
that lets you work with them using the xpath
value of #COMMENT
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With