Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

php xpath: query within a query result

Tags:

php

xpath

I'm trying to parse an html file.

The idea is to fetch the span's with title and desc classes and to fetch their information in each div that has the attribute class='thebest'.

here is my code:

<?php

$example=<<<KFIR
<html>
<head>
<title>test</title>
</head>
<body>
 <div class="a">moshe1
<div class="aa">haim</div>
 </div>
 <div class="a">moshe2</div>
 <div class="b">moshe3</div>

<div class="thebest">
<span class="title">title1</span>
<span class="desc">desc1</span>
</div>
<div class="thebest">
span class="title">title2</span>
<span class="desc">desc2</span>
</div>

</body>
</html>
KFIR;


$doc = new DOMDocument();
@$doc->loadHTML($example);
$xpath = new DOMXPath($doc);
$expression="//div[@class='thebest']";
$arts = $xpath->query($expression);

foreach ($arts as $art) {
    $arts2=$xpath->query("//span[@class='title']",$art);
    echo $arts2->item(0)->nodeValue;
    $arts2=$xpath->query("//span[@class='desc']",$art);
    echo $arts2->item(0)->nodeValue;
}
echo "done";

the expected results are:

title1desc1title2desc2done 

the results that I'm receiving are:

title1desc1title1desc1done
like image 554
ufk Avatar asked Jul 06 '10 17:07

ufk


2 Answers

Make the queries relative... start them with a dot (e.g. ".//…").

foreach ($arts as $art) {
    // Note: single slash (direct child)
    $titles = $xpath->query("./span[@class='title']", $art);
    if ($titles->length > 0) {
        $title = $titles->item(0)->nodeValue;
        echo $title;
    }

    $descs = $xpath->query("./span[@class='desc']", $art);
    if ($descs->length > 0) {
        $desc = $descs->item(0)->nodeValue;
        echo $desc;
    }
}
like image 180
salathe Avatar answered Nov 11 '22 21:11

salathe


Instead of doing the second query try textContent

foreach ($arts as $art) {
    echo $art->textContent;
}

textContent returns the text content of this node and its descendants.

As an alternative, change the XPath to

$expression="//div[@class='thebest']/span[@class='title' or @class='desc']";
$arts = $xpath->query($expression);

foreach ($arts as $art) {
    echo $art->nodeValue;
}

That would fetch the span children of the divs with a class thebest having a class of title or desc.

like image 24
Gordon Avatar answered Nov 11 '22 19:11

Gordon