Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Xpath to get the 2nd url with the matching text in the href tag

A html page has paging links, 1 set at the top of the page and another on the bottom of the page.

Using HtmlUnit, I am currently getting the HtmlAnchor on the page using getByAnchorText("1");

There is a problem in some of the links on the top, so I want to reference the bottom links using XPath.

nextPageAnchor = (HtmlAnchor) page.getByXPath("");

How can I reference the 2nd link on the page, with using xpath?

I need to reference the link using the AnchorText, so a link like:

<a href="....">33</a>

The href has random text, and is a javascript function so I have no idea what it will be.

Is this possible with xpath?

like image 377
Blankman Avatar asked Dec 10 '22 16:12

Blankman


2 Answers

To select the second a element anywhere in the document:

(//a)[2]

To select the second a element with a particular text in the href attribute:

(//a[@href='...'])[2]

Note that the parantheses are required, and that the expression //a[2] will not do what you intend: it will select all a elements that are the second a element of any parent. If your input is

<p>Link <a href="one.html">One</a></p>
<p>Link <a href="two.html">Two</a> and <a href="three.html">Three</a>.</p>
<p>Link <a href="four.html">Four</a> and <a href="five.html">Five</a>.</p>

(//a)[2] will return the second link (two.html), while //a[2] will return the third and fifth link (three.html and five.html), since these both are the second a child of their parent.

like image 106
markusk Avatar answered Dec 14 '22 22:12

markusk


It's pretty simple:

 (//a)[2]

the //a gets all anchors on the page and the [2] gets the second one (it's one-indexed not zero-indexed, so 2, is actually the 2nd, not the 3rd as you would expect with an array, for example)

If you want to get a link with the text of 33 then you can use:

 //a[./text() = "33"]

See http://www.w3.org/TR/xpath/ for the full xpath definition.

EDIT

To address Alexandre's comment, you could use

 (//a[./text() = "33"])[2]

This will first select all <a> tags with a text of 33, and then it will select the second of those.

EDIT 2

NOTE: The location path //para[1] does not mean the same as the location path /descendant::para[1]. The latter selects the first descendant para element; the former selects all descendant para elements that are the first para children of their parents.

Markusk is indeed correct. The quote above is from the xPath definition referenced above.

like image 36
Jonathan Fingland Avatar answered Dec 15 '22 00:12

Jonathan Fingland