So right now if I have something like this:
//div[@class='artist']/p[x]/text()
x can either be 3 or 4, or maybe even a different number. Luckily, if what I am looking for is not in 3, I can just check for null and go on until I find text. The issue is I would rather know I'm going to the right element every time. So I tried this:
div[@class='people']/h3[text()='h3 text']/p/text()
since there will always be a <p>
right after <h3>h3 text</h3>
. However this never returns anything, and usually results in an error. If I remove /p I will get 'h3 text' returned.
Anyway, how do I get that <p>
directly after <h3>
?
BTW, I'm using HTMLCleaner in Java for this.
By default when you don't specify an axis you get the child::
axis, which is why the /
operator seems to descend the DOM tree child by child. There is an implied child::
after each slash.
In your case you don't want to find a child of the <div>
, you want to find a sibling of it. A sibling is an element at the same nesting level. Specifically, you should use the following-sibling::
axis.
div[@class='people']/h3[text()='h3 text']/following-sibling::p/text()
Axes are an advanced feature of XPath. They are one of the features that make XPath especially powerful.
You're already familiar with one other axis, though you may not have realized it: the @
symbol is shorthand for attribute::
. When you write @href
you're really saying attribute::href
, as in look for an attribute called "href" instead of a child.
Axes, eh? Shorthand, eh? Tell me more, you say? OK!
.
and ..
are shorthand for the more verbose self::node()
and parent::node()
, respectively. You could use the longer forms if you wished.
The //
operator you commonly see as //p
or body//a
has a hidden descendant-or-self::node()
between the slashes. //p
is shorthand for /descendant-or-self::node()/p
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With