Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Nested DOM XPath?

Suppose you have something like

<div>
    <p>...</p>
    <p>There are an unbounded number of these p tags</p>
    <p>etc etc...could be 4 of these one time, then 9 the next time</p>
</div>
<div>
    <p>Same here, an unbounded number</p>
    <p>etc</p>
</div>
<div>
    <p>And so on...</p>
    <p>...</p>
    <p>...</p>
    <p>...</p>
</div>

Suppose I wanted to grab the 1st p node out of the first div, the 2nd p node out of the second div, and the 3rd p node out of the third div. Now if this were xml, I'd use SimpleXMLElement and do something like

foreach ($data->xpath('//div') as $cur){
      //Then work within each <div> that is returned, could even xpath this again if I needed to
    }

But how could you select each div individually then work within each, or do the equivalent, in DOM XPath? If I did say

$query = $data->query('//div');

I would get a list of DOMElements, which as far as I know cannot be used in another DOM XPath (now if it could that would work I suppose), so I couldn't nest XPath requests or at least I'm not getting any results returned when I try to use the nodeValue / textValue and create a new DOMDocument and DOMXPath with it. The nodeValue / textValue appear to have all tags stripped, which is why I imagine it's not returning any results.

Now I could delimit by '\n' in this case and parse the nodeValue, but imagine that within each div with an unbounded number of each type of child node, we were looking for something say, 5 levels down. Then that'd become a giant ugly mess.

Basically, SimpleXMLElement->xpath preserves the document structure, whereas DOM XPath does not appear to.

So, is there a good general way of doing this?

like image 209
章 哲 Avatar asked Feb 16 '23 08:02

章 哲


1 Answers

You can access nested elements. For example, if you want to access the text from the first paragraph of the second div, you could do it as follows:

$doc = new DOMDocument();
if ( ! @$doc->loadHTML($html)){
    return FALSE;
}
$xpath = new DOMXPath($doc);
$res = $xpath->query('//div');
$sub = $xpath->query('.//p', $res->item(1));//paragraphs of second div 
echo trim($sub->item(0)->nodeValue);//first paragraph

Notice that $sub is a query relative to the first query stored in $res.

The output is:

Same here, an unbounded number

like image 52
Expedito Avatar answered Feb 23 '23 20:02

Expedito