I'm using the Crawler library that helps you to make some XPath expressions to get the content of the HTML tags. I'm currently reading a HTML5 content from a page and I want to retrieve a text that is not inserted in a tag in this way.
<div class="country">
<strong> USA </strong>
Some text here
</div>
So I'm trying to get this text Some text here but the crawler library allows to get just what's in a tag and not outside it.
So any alternative please.
These's the Crawler part :
$crawler = new Crawler();
$crawler->xpathSingle($xml, '//div[@class="country"]/strong/@text');
Either of these XPaths will return "Some text here"
as requested:
normalize-space(substring-after(//div[@class="country"], 'USA'))
normalize-space(//div[@class="country"]/strong/following-sibling::text())
Choose based on the sort of variations you wish to accommodate.
Credit: Second example is derived from suggestion first made in comment by @Keith Hall.
Update:
As I mentioned you'll need to choose your XPath based on the variations you wish to accomodate. No sooner did I post than you encountered a variation:
<div class="country">
<strong> USA </strong>
Some text here
<i>Do not want this text</i>
</div>
You can exclude "Do not want this text"
and return "Some text here"
as requested using the second XPath above but just grab the first following text node:
normalize-space(//div[@class="country"]/strong/following-sibling::text()[1])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With