Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

getting element content with simpe-html-dom

I'm using simpile_html_dom for getting html pages elements. I have some div elements like this. All i want is to get "Fine Thanks" sentence in each div (that is not inside any sub-element). How can i do it?

<div class="right">
<h2>
<a href="">Hello</a>
</h2>
<br/>
<span>How Are You?</span>
<span>How Are You?</span>
<span>How Are You?</span>
Fine Thanks
</div>
like image 899
AshKan Avatar asked Apr 11 '13 06:04

AshKan


1 Answers

It should be simply $html->find('div.right > text'), but that won't work because Simple HTML DOM Parser doesn't seem to support direct descendant queries.

So you'd have to find all <div> elements first and search the child nodes for a text node. Unfortunately, the ->childNodes() method is mapped to ->children() and thus only returns elements.

A working solution is to call ->find('text') on each <div> element, after which you filter the results based on the parent node.

foreach ($doc->find('div.right') as $parent) {
    foreach ($parent->find('text') as $node) {
        if ($node->parent() === $parent && strlen($t = trim($node->plaintext))) {
            echo $t, PHP_EOL;
        }
    }
}

Using DOMDocument, this XPath expression will do the same work without the pain:

$doc = new DOMDocument;
$doc->loadHTML($content);
$xp = new DOMXPath($doc);

foreach ($xp->query('//div/text()') as $node) {
    if (strlen($t = trim($node->textContent))) {
        echo $t, PHP_EOL;
    }
}
like image 155
Ja͢ck Avatar answered Sep 21 '22 07:09

Ja͢ck