Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I do an XPath query on a DOMNode?

Tags:

dom

php

Is there a way to do an xpath query on a DOMNode? Or at least convert it to a DOMXPath?

<html>   ...   <div id="content">      ...      <div class="listing">          ...          <div></div>          <div></div>          <div  class='foo'>            <h3>Get me 1</h3>            <a>and me too 1</a>          </div>      </div>      <div class="listing">          ...          <div></div>          <div></div>          <div class='foo'>            <h3>Get me 2</h3>            <a>and me too 1</a>          </div>      </div>      ....   </div> </html> 

This is my code. I am trying to get a list of array that has the values of the h3 and a tags in each array. To do that, I needed to get each listing, and then get the h3 and a tag's value in each listing.

$html_dom = new DOMDocument(); @$html_dom->loadHTML($html); $x_path = new DOMXPath($html_dom);  $nodes= $x_path->query("//div[@id='content']//div[@class='listing']");  foreach ($nodes as $node) {   // I want to further dig down here using query on a DOMNode } 
like image 485
developarvin Avatar asked May 24 '13 03:05

developarvin


People also ask

What is DOMXPath in PHP?

The DOMXPath::query() function is an inbuilt function in PHP which is used to evaluate the given XPath expression. Syntax: DOMNodeList DOMXPath::query( string $expression, DOMNode $contextnode, bool $registerNodeNS )

Can I use XPath on HTML?

Note that HTML and XML have a very similar structure, which is why XPath can be used almost interchangeably to navigate both HTML and XML documents.

What is XPath in Dom?

XPath stands for XML Path Language. It uses a non-XML syntax to provide a flexible way of addressing (pointing to) different parts of an XML document. It can also be used to test addressed nodes within a document to determine whether they match a pattern or not.


1 Answers

Pass the node as the second argument to DOMXPath::query

contextnode: The optional contextnode can be specified for doing relative XPath queries. By default, the queries are relative to the root element.

Example:

foreach ($nodes as $node) {     foreach ($x_path->query('h3|a', $node) as $child) {         echo $child->nodeValue, PHP_EOL;     } } 

This uses the UNION operator for a result of

Get me 1 and me too 1 Get me 2 and me too 1 

If you don't need any complex querying, you can also do

foreach ($nodes as $node) {     foreach ($node->getElementsByTagName('a') as $a) {       echo $a->nodeValue, PHP_EOL;     } } 

Or even by iterating the child nodes (note that this includes all the text nodes)

foreach ($nodes as $node) {     foreach ($node->childNodes as $child) {       echo $child->nodeName, PHP_EOL;     } } 

However, all of that is unneeded since you can fetch these nodes directly:

$nodes= $x_path->query("/html/body//div[@class='listing']/div[last()]");  foreach ($nodes as $i => $node) {     echo $i, $node->nodeValue, PHP_EOL; } 

will give you two nodes in the last div child of all the divs with a class attribute value of listing and output the combined text node values, including whitespace

0            Get me 1            and me too 1  1            Get me 2            and me too 1 

Likewise, the following

"//div[@class='listing']/div[last()]/node()[name() = 'h3' or name() = 'a']" 

will give you the four child H3 and A nodes and output

0Get me 1 1and me too 1 2Get me 2 3and me too 1 

If you need to differentiate these by name while iterating over them, you can do

foreach ($nodes as $i => $node) {     echo $i, $node->nodeName, $node->nodeValue, PHP_EOL; } 

which will then give

0h3Get me 1 1aand me too 1 2h3Get me 2 3aand me too 1 
like image 91
Gordon Avatar answered Sep 20 '22 13:09

Gordon