Logo Questions Linux Laravel Mysql Ubuntu Git Menu

How to use XPath to select child text after another child element





I'm using the Crawler library that helps you to make some XPath expressions to get the content of the HTML tags. I'm currently reading a HTML5 content from a page and I want to retrieve a text that is not inserted in a tag in this way.

<div class="country">
    <strong> USA </strong>
        Some text here

So I'm trying to get this text Some text here but the crawler library allows to get just what's in a tag and not outside it.

So any alternative please.

These's the Crawler part :

$crawler = new Crawler();
$crawler->xpathSingle($xml, '//div[@class="country"]/strong/@text');
like image 954
KubiRoazhon Avatar asked Oct 19 '22 12:10


1 Answers

Either of these XPaths will return "Some text here" as requested:

  • normalize-space(substring-after(//div[@class="country"], 'USA'))

  • normalize-space(//div[@class="country"]/strong/following-sibling::text())

Choose based on the sort of variations you wish to accommodate.

Credit: Second example is derived from suggestion first made in comment by @Keith Hall.


As I mentioned you'll need to choose your XPath based on the variations you wish to accomodate. No sooner did I post than you encountered a variation:

<div class="country">
    <strong> USA </strong>
        Some text here
    <i>Do not want this text</i>

You can exclude "Do not want this text" and return "Some text here" as requested using the second XPath above but just grab the first following text node:

  • normalize-space(//div[@class="country"]/strong/following-sibling::text()[1])
like image 176
kjhughes Avatar answered Nov 15 '22 07:11
