Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using DOMDocument to extract from HTML document by class

In the DOMDocument class there are methods to get elements by by id and by tag name (getElementById & getElementsByTagName) but not by class. Is there a way to do this?

As an example, how would I select the div from the following markup?

<html>
...
<body>
...
<div class="foo">
...
</div>
...
</body>
</html>
like image 345
famblycat Avatar asked Feb 25 '11 19:02

famblycat


2 Answers

The simple answer is to use xpath:

$dom = new DomDocument();
$dom->loadHtml($html);
$xpath = new DomXpath($dom);
$div = $xpath->query('//*[@class="foo"]')->item(0);

But that won't accept spaces. So to select by space separated class, use this query:

//*[contains(concat(' ', normalize-space(@class), ' '), ' class ')
like image 74
ircmaxell Avatar answered Nov 14 '22 09:11

ircmaxell


$html = '<html><body><div class="foo">Test</div><div class="foo">ABC</div><div class="foo">Exit</div><div class="bar"></div></body></html>';

$dom = new DOMDocument();
@$dom->loadHtml($html);

$xpath = new DOMXPath($dom);

$allClass = $xpath->query("//@class");
$allClassBar = $xpath->query("//*[@class='bar']");

echo "There are " . $allClass->length . " with a class attribute<br>";

echo "There are " . $allClassBar->length . " with a class attribute of 'bar'<br>";
like image 42
Jake N Avatar answered Nov 14 '22 08:11

Jake N