Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to return only first match result in XPath?

Tags:

php

xml

xpath

I tried to use XPath string-after to grab data after Property ID: but the result is not what I want.It show all the result that matched with Property ID. I want only P-000324. And here are my code

<?php
$getURL = file_get_contents('http://realestate.com.kh/residential-for-rent-in-phnom-penh-daun-penh-phsar-chas-2-beds-apartment-1001192296/');
$dom = new DOMDocument();
@$dom->loadHTML($getURL);
$xpath = new DOMXPath($dom);

echo $xpath->evaluate("normalize-space(substring-after(., 'Property ID:'))");

So how can I make it get only one first result?

like image 754
july77 Avatar asked Sep 26 '22 05:09

july77


1 Answers

You can change your XPath expression to select the string after only the first occurrence of a p that contains Property ID: by using a position index ([1]).

For example, the following XPath expression will select just the first paragraph that directly contains the string 'Property ID:':

(//p[contains(text(),'Property ID:')])[1]

Putting this together with your request to return just the string that follows 'Property ID:' but nothing beyond the P-000324 string:

echo $xpath->evaluate("normalize-space(substring-before(substring-after((//p[contains(text(),'Property ID:')])[1], 'Property ID:'), '–'))");

will echo just P-000324 as requested.

Update: This solves the problem for the original page as it was originally presented, but the goal seems to be broader per the comments. A more robust solution would be to use just the first expression to obtain the string for the first paragraph containing 'Property ID' and then do regex pattern matching immediately after the label on normal forms of the property id or normal forms of delimiters surrounding property id. You'll have to use the regex facilities of the hosting language as XPath 1.0's string processing functions are very limited; XPath 2.0's are much better and included regex capabilities.

like image 152
kjhughes Avatar answered Sep 30 '22 06:09

kjhughes