Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

htmlcxx API usage

Tags:

c++

html

I am using the htmlcxx library to read an HTML file and generate the same HTML file with additional content.

I can read the file with no problem, but simply emitting the original HTML file doesn't correctly include the end tags. That is, when I simply iterate and output the entire DOM, no closing tags are emitted.

I know that there is a closingText() interface for a node (see Node.h), but I can't seem to find a way to use it that lets me do what I need.

Here is how I'm dumping the DOM:

it = dom.begin();
end = dom.end();
for (; it != end; ++it)
{
    cout << it->text();
} 

The above gives me:

<div>
    <li>
       <div>
(blank)
(blank)
(blank)
<div>
(blank)

for the following html:

<div>
    <li>
        <div>
        </div>
    </li>
</div>
<div>
</div>

Anything i can do other than changing the code?

like image 575
dev_overflow Avatar asked Jul 14 '12 02:07

dev_overflow


1 Answers

There is virtually no documentation provided with this library, and only a very small set of example code. The tree manipulation was lifted from http://tree.phi-sci.com and that site has a little more documentation, but not much in the way of additional example code.

Every example that I have seen uses the basic "depth-first" iterator, which allows you to traverse the tree using a simple for loop. This doesn't seem to be very useful, since to serialize an HTML tree, you really need to use recursion.

I hacked about until I got a recursive algorithm working. This may not be the best way to use the library, but it seems to work.

void walk_tree( tree<HTML::Node> const & dom )
{
    tree<HTML::Node>::iterator it = dom.begin();
    cout << it->text();
    for ( unsigned i = 0; i < dom.number_of_children(it); i++ )
    {
        walk_tree( dom.child(it, i) );
    }
    cout << it->closingText();
}

As you can see from my code, the text() and closingText() functions bracket whatever content is contained in the sub-tree which is processed recursively.

like image 148
Brent Bradburn Avatar answered Nov 02 '22 03:11

Brent Bradburn