Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using XPath to select the href attribute of the following-sibling

I am attempting to scrape the following site: http://www.hudson211.org/zf/profile/service/id/659837

I am trying to select the href next to the "web address" text. The following xpath selector gets the tag I am after:

$x("//th[contains(text(), 'Web Address')]/following-sibling::td/a")

returns

<a href="http://www.co.sullivan.ny.us">www.co.sullivan.ny.us</a>

However, when I specifically try to extract the href using @href, the return value is an empty array:

$x("//th[contains(text(), 'Web Address')]/following-sibling::td/a/@href")

returns []

This is the html of the row I am looking at:

<tr valign="top">
    <td class="profile_view_left"></td>
    <th align="left" class="profile_view_center">Web Address</th>
    <td class="profile_view_right">
      <ahref="http://www.co.sullivan.ny.us">www.co.sullivan.ny.us</a>                         </td>
    <td></td>
</tr>
like image 729
Kevin George Avatar asked Jun 07 '15 00:06

Kevin George


People also ask

How can I reach sibling in XPath?

We can use the XPath following sibling axis to find this. So, for this scenario, the XPath expression will be. And we need to identify its sibling “div ” element, as shown below. However, if numerous siblings have the same node, XPath will recognise all of the different elements.

How do I write an XPath for a link?

xpath("//a[@href='/docs/configuration']")). click(); The above line works fine.

What does following sibling mean in XPath?

The following-sibling axis indicates all the nodes that have the same parent as the context node and appear after the context node in the source document.


1 Answers

I assume you're using Google Chrome console because of that $x() function. Your xpath which selects @href attribute actually worked, as I tested in my Chrome, only the result is not displayed in the console like when you selected an element -for a reason that I'm not quite sure at the moment- :

>var result = $x("//th[contains(text(), 'Web Address')]/following-sibling::td/a/@href")
undefined
>result[0].value
"http://www.co.sullivan.ny.us"

see that using the exact same expression, variable result contains the expected url value. If your intention is simply to display single href value in the console without further processing, this will do :

>$x("//th[contains(text(), 'Web Address')]/following-sibling::td/a/@href")[0].value
"http://www.co.sullivan.ny.us"
like image 113
har07 Avatar answered Oct 21 '22 02:10

har07