I tried to use XPath string-after
to grab data after Property ID: but the result is not what I want.It show all the result that matched with Property ID. I want only P-000324
. And here are my code
<?php
$getURL = file_get_contents('http://realestate.com.kh/residential-for-rent-in-phnom-penh-daun-penh-phsar-chas-2-beds-apartment-1001192296/');
$dom = new DOMDocument();
@$dom->loadHTML($getURL);
$xpath = new DOMXPath($dom);
echo $xpath->evaluate("normalize-space(substring-after(., 'Property ID:'))");
So how can I make it get only one first result?
You can change your XPath expression to select the string after only the first occurrence of a p
that contains Property ID:
by using a position index ([1]
).
For example, the following XPath expression will select just the first paragraph that directly contains the string 'Property ID:':
(//p[contains(text(),'Property ID:')])[1]
Putting this together with your request to return just the string that follows 'Property ID:' but nothing beyond the P-000324
string:
echo $xpath->evaluate("normalize-space(substring-before(substring-after((//p[contains(text(),'Property ID:')])[1], 'Property ID:'), '–'))");
will echo just P-000324
as requested.
Update: This solves the problem for the original page as it was originally presented, but the goal seems to be broader per the comments. A more robust solution would be to use just the first expression to obtain the string for the first paragraph containing 'Property ID' and then do regex pattern matching immediately after the label on normal forms of the property id or normal forms of delimiters surrounding property id. You'll have to use the regex facilities of the hosting language as XPath 1.0's string processing functions are very limited; XPath 2.0's are much better and included regex capabilities.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With