Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

DomCrawler Symfony: how to get content from a node excluding children?

Lets say I have an html page like this:

<html>
<head></head>
<body>
    Hello World!
    <div> my other content </div>
</body>
</html>

How do i get "Hello World" from the DOM Crawler?

I thought this would work:

$crawler = $crawler
    ->filter('body > div');
    ->reduce(function (Crawler $node, $i) {
        return false;
    });

But this obviously will give an error:

InvalidArgumentException: "The current node list is empty"
like image 810
apfz Avatar asked Aug 25 '14 11:08

apfz


1 Answers

Don't know if this can be done easier, but you could extract text node contents using XPath:

$crawler->filterXPath('//body/text()')->text();

Result will be a string containing Hello World and empty spaces before and after text until first tag. So if you want just the text itself you could trim the value:

$helloWorld = trim($crawler->filterXPath('//body/text()')->text());

This will work in your case, however, if you have multiple text nodes in the body, eg:

<html>
<head></head>
<body>
    Hello World!
    <div> my other content </div>
    Some other text
</body>
</html>

You might do:

$crawler->filterXPath('//body/text()')->extract(['_text']));

This will return an array:

Array
(
    [0] =>
        Hello World!

    [1] =>
        Some other text

)
like image 79
Igor Pantović Avatar answered Nov 18 '22 14:11

Igor Pantović