Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Getting contents of a div with PHP's DOM

I've looked through the other Stackoverflow questions on this topic and none of the solutions provided there seem to work for me.

I have an html page (scraped with file_get_contents()) and in that html is a div with an id of "main" - I need to get the contents of that div with PHP's DOMDocument, or something similiar. For this situation I can't use the SimpleHTMLDom parser, which complicates things a bit.

like image 287
Charles Zink Avatar asked Jun 20 '11 00:06

Charles Zink


2 Answers

DOMDocument + XPath variation:

$xml = new DOMDocument();
$xml->loadHtml($temp);
$xpath = new DOMXPath($xml);

$html = '';
foreach ($xpath->query('//div[@id="main"]/*') as $node)
{
    $html .= $xml->saveXML($node);
}

If you're looking for innerHTML() (PHP DOMDocument Reference Question) - instead of innerXML() as in this answer - the xpath related variant is given in this answer.

Here the adoption with the changes underlined:

$html = '';
foreach ($xpath->query('//div[@id="main"]/node()') as $node)
                                          ######
{
    $html .= $xml->saveHTML($node);
                       ####
}
like image 104
hakre Avatar answered Sep 23 '22 12:09

hakre


Using DOMDocument...

$dom = new DOMDocument;

$dom->loadHTML($html);

$main = $dom->getElementById('main');

To get the serialised HTML...

html = '';
foreach($main->childNodes as $node) {
    $html .= $dom->saveXML($node, LIBXML_NOEMPTYTAG);
}

Use saveHTML() if your PHP version supports it.

like image 44
alex Avatar answered Sep 21 '22 12:09

alex