Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use XPath to select child text after another child element

Tags:

html

xml

xpath

I'm using the Crawler library that helps you to make some XPath expressions to get the content of the HTML tags. I'm currently reading a HTML5 content from a page and I want to retrieve a text that is not inserted in a tag in this way.

<div class="country">
    <strong> USA </strong>
        Some text here
</div>

So I'm trying to get this text Some text here but the crawler library allows to get just what's in a tag and not outside it.

So any alternative please.

These's the Crawler part :

$crawler = new Crawler();
$crawler->xpathSingle($xml, '//div[@class="country"]/strong/@text');
like image 954
KubiRoazhon Avatar asked Oct 19 '22 12:10

KubiRoazhon


1 Answers

Either of these XPaths will return "Some text here" as requested:

  • normalize-space(substring-after(//div[@class="country"], 'USA'))

  • normalize-space(//div[@class="country"]/strong/following-sibling::text())

Choose based on the sort of variations you wish to accommodate.

Credit: Second example is derived from suggestion first made in comment by @Keith Hall.


Update:

As I mentioned you'll need to choose your XPath based on the variations you wish to accomodate. No sooner did I post than you encountered a variation:

<div class="country">
    <strong> USA </strong>
        Some text here
    <i>Do not want this text</i>
</div>

You can exclude "Do not want this text" and return "Some text here" as requested using the second XPath above but just grab the first following text node:

  • normalize-space(//div[@class="country"]/strong/following-sibling::text()[1])
like image 176
kjhughes Avatar answered Nov 15 '22 07:11

kjhughes