DomCrawler Symfony: how to get content from a node excluding children?

Question

Lets say I have an html page like this:

<html>
<head></head>
<body>
    Hello World!
    <div> my other content </div>
</body>
</html>

How do i get "Hello World" from the DOM Crawler?

I thought this would work:

$crawler = $crawler
    ->filter('body > div');
    ->reduce(function (Crawler $node, $i) {
        return false;
    });

But this obviously will give an error:

InvalidArgumentException: "The current node list is empty"

Igor Pantović · Accepted Answer

Don't know if this can be done easier, but you could extract text node contents using XPath:

$crawler->filterXPath('//body/text()')->text();

Result will be a string containing Hello World and empty spaces before and after text until first tag. So if you want just the text itself you could trim the value:

$helloWorld = trim($crawler->filterXPath('//body/text()')->text());

This will work in your case, however, if you have multiple text nodes in the body, eg:

<html>
<head></head>
<body>
    Hello World!
    <div> my other content </div>
    Some other text
</body>
</html>

You might do:

$crawler->filterXPath('//body/text()')->extract(['_text']));

This will return an array:

Array
(
    [0] =>
        Hello World!

    [1] =>
        Some other text

)

DomCrawler Symfony: how to get content from a node excluding children?

Tags:

symfony

web-crawler

apfz

1 Answers

Igor Pantović

Recent Activity

Donate For Us

DomCrawler Symfony: how to get content from a node excluding children?

Tags:

symfony

web-crawler

apfz

1 Answers

Igor Pantović

Related questions

Recent Activity

Donate For Us