<doc ok="yes">
<a>
<b>
<c>
aa
<d ok="yes">
bb
</d>
cc
</c>
</b>
</a>
<e>
ee
</e>
<f ok="no">
no
</f>
</doc>
I need to retrieve list of nodes using XPath, where each node must satisfy these conditions:
node has at least one child text node
if the node (or closest node in ancestor axis) has an attribute "ok"
, the value must be "yes"
when any ancestor is a part of the result, exclude node
So in my sample I would like to get <c>
and <e>
. Node <d>
is excluded because it is a child of <c>
, which is a part of the result.
I've started with condition (1) using this expression //*[count(./text()[normalize-space()])>0]
. It returns <c>
, <d>
, <e>
and <f>
. I have no idea how to exclude <d>
I would devide this into 2 steps. First, consider only condition number 1 and 2.
//*[text()[normalize-space()]]
[
ancestor-or-self::*[not(@ok)]
or
ancestor-or-self::*[@ok][1][@ok='yes']
]
Given XML in question as input, above xpath return 3 elements : <c>
, <d>
, and <e>
.
Next step would be implementing the condition number 3. That can be done by repeating the same predicate that was used in the first step, but now for ancestor::*
instead of current node. Then negate the repeated predicate using not()
as we want the ancestor to fail the condition no 1 & 2 (we want ancestor of current node not being part of the result already) :
[not(
ancestor::*[text()[normalize-space()]]
[
ancestor-or-self::*[not(@ok)]
or
ancestor-or-self::*[@ok][1][@ok='yes']
]
)
]
Combining both steps together you get the following xpath :
//*[text()[normalize-space()]]
[
ancestor-or-self::*[not(@ok)]
or
ancestor-or-self::*[@ok][1][@ok='yes']
]
[not(
ancestor::*[text()[normalize-space()]]
[
ancestor-or-self::*[not(@ok)]
or
ancestor-or-self::*[@ok][1][@ok='yes']
]
)
]
Each of the outer predicate ([]
) in the final xpath, in order, represents condition no 1, 2, and 3.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With