XPath

Question

<doc ok="yes">
    <a>
        <b>
            <c>
                aa
                <d ok="yes">
                    bb
                </d>
                cc
            </c>
        </b>
    </a>
    <e>
        ee
    </e>
    <f ok="no">
        no
    </f>
</doc>

I need to retrieve list of nodes using XPath, where each node must satisfy these conditions:

node has at least one child text node
if the node (or closest node in ancestor axis) has an attribute "ok", the value must be "yes"
when any ancestor is a part of the result, exclude node

So in my sample I would like to get <c> and <e>. Node <d> is excluded because it is a child of <c>, which is a part of the result.

I've started with condition (1) using this expression //*[count(./text()[normalize-space()])>0]. It returns <c>, <d>, <e> and <f>. I have no idea how to exclude <d>

har07 · Accepted Answer

I would devide this into 2 steps. First, consider only condition number 1 and 2.

//*[text()[normalize-space()]]
   [
      ancestor-or-self::*[not(@ok)] 
        or 
      ancestor-or-self::*[@ok][1][@ok='yes']
    ]

Given XML in question as input, above xpath return 3 elements : <c>, <d>, and <e>.

Next step would be implementing the condition number 3. That can be done by repeating the same predicate that was used in the first step, but now for ancestor::* instead of current node. Then negate the repeated predicate using not() as we want the ancestor to fail the condition no 1 & 2 (we want ancestor of current node not being part of the result already) :

[not(
        ancestor::*[text()[normalize-space()]]
        [
            ancestor-or-self::*[not(@ok)] 
                or 
            ancestor-or-self::*[@ok][1][@ok='yes']
        ]
    )
]

Combining both steps together you get the following xpath :

//*[text()[normalize-space()]]
   [
      ancestor-or-self::*[not(@ok)] 
        or 
      ancestor-or-self::*[@ok][1][@ok='yes']
    ]
    [not(
            ancestor::*[text()[normalize-space()]]
            [
                ancestor-or-self::*[not(@ok)] 
                    or 
                ancestor-or-self::*[@ok][1][@ok='yes']
            ]
        )
    ]

Each of the outer predicate ([]) in the final xpath, in order, represents condition no 1, 2, and 3.

XPath - get parent of text nodes with condition

Tags:

Iale

1 Answers

har07

Recent Activity

Donate For Us