Lets say I have an html page like this:
<html>
<head></head>
<body>
Hello World!
<div> my other content </div>
</body>
</html>
How do i get "Hello World" from the DOM Crawler?
I thought this would work:
$crawler = $crawler
->filter('body > div');
->reduce(function (Crawler $node, $i) {
return false;
});
But this obviously will give an error:
InvalidArgumentException: "The current node list is empty"
Don't know if this can be done easier, but you could extract text node contents using XPath:
$crawler->filterXPath('//body/text()')->text();
Result will be a string
containing Hello World
and empty spaces before and after text until first tag. So if you want just the text itself you could trim the value:
$helloWorld = trim($crawler->filterXPath('//body/text()')->text());
This will work in your case, however, if you have multiple text nodes in the body, eg:
<html>
<head></head>
<body>
Hello World!
<div> my other content </div>
Some other text
</body>
</html>
You might do:
$crawler->filterXPath('//body/text()')->extract(['_text']));
This will return an array:
Array
(
[0] =>
Hello World!
[1] =>
Some other text
)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With