Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

libxml2 HTML parsing

I'm parsing HTML with libxml2, using XPath to find elements. Once I found the element I'm looking for, how can I get the HTML as a string from that element (keeping in mind that this element will have many child elements). Given a document:

<html>
    <header>
        <title>Some document</title>
    </header

    <body>
        <p id="faq">
            Some kind of text <a href="http://www.nowhere.com/">here</a>.
        </p>
    </body>
</html>

Say I retrieved the body element with XPath and then get the HTML for that, I'd like to end up with a string containing:

<body>
    <p id="faq">
        Some kind of text <a href="http://www.nowhere.com/">here</a>.
    </p>
</body>

How can I do this?

like image 822
johndoe Avatar asked Aug 22 '10 21:08

johndoe


1 Answers

That is the purpose of xmlNodeDump:

EDIT:

When you have a xmlNodePtr node, do something like:

xmlBufferPtr nodeBuffer = xmlBufferCreate();
xmlNodeDump(nodeBuffer, doc, node, 0, 1);
// ... Do something with nodeBuffer->content
// When done:
xmlBufferFree(nodeBuffer);

The 4th and 5th parameters control indentation and formatting.

like image 187
Matthew Flaschen Avatar answered Oct 21 '22 16:10

Matthew Flaschen