I am using the htmlcxx library to read an HTML file and generate the same HTML file with additional content.
I can read the file with no problem, but simply emitting the original HTML file doesn't correctly include the end tags. That is, when I simply iterate and output the entire DOM, no closing tags are emitted.
I know that there is a closingText()
interface for a node (see Node.h
), but I can't seem to find a way to use it that lets me do what I need.
Here is how I'm dumping the DOM:
it = dom.begin();
end = dom.end();
for (; it != end; ++it)
{
cout << it->text();
}
The above gives me:
<div>
<li>
<div>
(blank)
(blank)
(blank)
<div>
(blank)
for the following html:
<div>
<li>
<div>
</div>
</li>
</div>
<div>
</div>
Anything i can do other than changing the code?
There is virtually no documentation provided with this library, and only a very small set of example code. The tree manipulation was lifted from http://tree.phi-sci.com and that site has a little more documentation, but not much in the way of additional example code.
Every example that I have seen uses the basic "depth-first" iterator, which allows you to traverse the tree using a simple for loop. This doesn't seem to be very useful, since to serialize an HTML tree, you really need to use recursion.
I hacked about until I got a recursive algorithm working. This may not be the best way to use the library, but it seems to work.
void walk_tree( tree<HTML::Node> const & dom )
{
tree<HTML::Node>::iterator it = dom.begin();
cout << it->text();
for ( unsigned i = 0; i < dom.number_of_children(it); i++ )
{
walk_tree( dom.child(it, i) );
}
cout << it->closingText();
}
As you can see from my code, the text()
and closingText()
functions bracket whatever content is contained in the sub-tree which is processed recursively.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With