Perl XML::LibXML: how to access comment nodes

Question

For the life of me I can't figure out the proper code to access the comment lines in my XML file. Do I use findnodes, find, getElementByTagName (doubt it).

Am I even making the correct assumption that these comment lines are accessible? I would hope so, as I know I can add a comment.

The type number for a comment node is 8, so they must be parseable.

Ultimately, what I want tot do is delete them.

my @nodes = $dom->findnodes("//*");

foreach my $node (@nodes) {
  print $node->nodeType, "
";
}

<TT>
 <A>xyz</A>
 <!-- my comment -->
</TT>

Borodin · Accepted Answer

If all you need to do is produce a copy of the XML with comment nodes removed, then the first parameter of toStringC14N is a flag that says whether you want comments in the output. Omitting all parameters implicitly sets the first to a false value, so
```
$doc->toStringC14N
```

will reproduce the XML trimmed of comments. Note that the Canonical XML form specified by C14N doesn't include an XML declaration header. It is always XML 1.0 encoded in UTF-8.

If you need to remove the comments from the in-memory structure of the document before processing it further, then findnodes with the XPath expression //comment() will locate them for you, and unbindNode will remove them from the XML.

This program demonstrates

use strict;
use warnings;

use XML::LibXML;

my $doc = XML::LibXML->load_xml(string => <<END_XML);
<TT>
 <A>xyz</A>
 <!-- my comment -->
</TT>
END_XML

# Print everything
print $doc->toString, "
";

# Print without comments
print $doc->toStringC14N, "

";

# Remove comments and print everything
$_->unbindNode for $doc->findnodes('//comment()');
print $doc->toString;

output

<?xml version="1.0"?>
<TT>
 <A>xyz</A>
 <!-- my comment -->
</TT>

<TT>
 <A>xyz</A>

</TT>

<?xml version="1.0"?>
<TT>
 <A>xyz</A>

</TT>

Update

To select a specific comment, you can add a predicate expression to the XPath selector. To find the specific comment in your example data you could write

$doc->findnodes('//comment()[. = " my comment "]')

Note that the text of the comment includes everything except the leading and trailing --, so spaces are significant as shown in that call.

If you want to make things a bit more lax, you could use normalize=space, which removes leading and trailing whitespace, and contracts every sequence of whitespace within the string to a single space. Now you can write

$doc->findnodes('//comment()[normalize-space(.) = "my comment"]')

And the same call would find your comment even if it looked like this.

<!--
my
comment
-->

Finally, you can make use of contains, which, as you would expect, simply checks whether one string contains another. Using that you could write

$doc->findnodes('//comment()[contains(., "comm")]')

The one to choose depends on your requirement and your situation.

ikegami · Answer

According to the XPath spec:

* is a test that matches element nodes of any name. Comment nodes aren't element nodes.
comment() is a test that matches comment nodes.

Untested:

for $comment_node ($doc->findnodes('//comment()')) {
   $comment_node->parentNode->removeChild($comment_node);
}

Birei · Answer

I know it's not XML::LibXML but here you have another way to remove comments easily with XML::Twig module:

#!/usr/bin/env perl

use warnings;
use strict;
use XML::Twig;

my $twig = XML::Twig->new(
    pretty_print => 'indented',
    comments => 'drop'
)->parsefile( shift )->print;

Run it like:

perl script.pl xmlfile

That yields:

<TT>
  <A>xyz</A>
</TT>

The comments option has also the value process that lets you work with them using the xpath value of #COMMENT.

Perl XML::LibXML: how to access comment nodes

Tags:

perl

libxml2

xml-libxml

CraigP

3 Answers

Borodin

ikegami

Birei

Recent Activity

Donate For Us

Perl XML::LibXML: how to access comment nodes

Tags:

perl

libxml2

xml-libxml

CraigP

3 Answers

Borodin

ikegami

Birei

Related questions

Recent Activity

Donate For Us