I have html on my site (http://testsite.com/test.php) :
<div class="first">
<div class="second">
<a href="/test.php">click</a>
<span>back</span>
</div>
</div>
<div class="first">
<div class="second">
<a href="/test.php">click</a>
<span>back</span>
</div>
</div>
I would like receive:
<div class="first">
<div class="second">
<a href="/test.php">click</a>
</div>
</div>
<div class="first">
<div class="second">
<a href="/test.php">click</a>
</div>
</div>
So i would like remove span. I use Goutte in Symfony2 based on http://symfony.com/doc/current/components/dom_crawler.html :
$client = new Client();
$crawler = $client->request('GET', 'http://testsite.com/test.php');
$crawler->filter('.first .second')->each(function ($node) {
//??????
});
As explained in the docs:
The DomCrawler component eases DOM navigation for HTML and XML documents.
and also:
While possible, the DomCrawler component is not designed for manipulation of the DOM or re-dumping HTML/XML.
DomCrawler is designed to extract details from DOM documents rather than modifying them.
However...
Since PHP passes objects by reference, and Crawler is basically a wrapper for DOMNodes, it's technically possible to modify the underlying DOM document:
// will remove all span nodes inside .second nodes
$crawler->filter('html .content h2')->each(function (Crawler $crawler) {
foreach ($crawler as $node) {
$node->parentNode->removeChild($node);
}
});
Here's a working example: https://gist.github.com/jakzal/8dd52d3df9a49c1e5922
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With