Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

XPath - get parent of text nodes with condition

Tags:

xpath

<doc ok="yes">
    <a>
        <b>
            <c>
                aa
                <d ok="yes">
                    bb
                </d>
                cc
            </c>
        </b>
    </a>
    <e>
        ee
    </e>
    <f ok="no">
        no
    </f>
</doc>

I need to retrieve list of nodes using XPath, where each node must satisfy these conditions:

  1. node has at least one child text node

  2. if the node (or closest node in ancestor axis) has an attribute "ok", the value must be "yes"

  3. when any ancestor is a part of the result, exclude node

So in my sample I would like to get <c> and <e>. Node <d> is excluded because it is a child of <c>, which is a part of the result.

I've started with condition (1) using this expression //*[count(./text()[normalize-space()])>0]. It returns <c>, <d>, <e> and <f>. I have no idea how to exclude <d>

like image 730
Iale Avatar asked Jan 08 '23 08:01

Iale


1 Answers

I would devide this into 2 steps. First, consider only condition number 1 and 2.

//*[text()[normalize-space()]]
   [
      ancestor-or-self::*[not(@ok)] 
        or 
      ancestor-or-self::*[@ok][1][@ok='yes']
    ]

Given XML in question as input, above xpath return 3 elements : <c>, <d>, and <e>.

Next step would be implementing the condition number 3. That can be done by repeating the same predicate that was used in the first step, but now for ancestor::* instead of current node. Then negate the repeated predicate using not() as we want the ancestor to fail the condition no 1 & 2 (we want ancestor of current node not being part of the result already) :

[not(
        ancestor::*[text()[normalize-space()]]
        [
            ancestor-or-self::*[not(@ok)] 
                or 
            ancestor-or-self::*[@ok][1][@ok='yes']
        ]
    )
]

Combining both steps together you get the following xpath :

//*[text()[normalize-space()]]
   [
      ancestor-or-self::*[not(@ok)] 
        or 
      ancestor-or-self::*[@ok][1][@ok='yes']
    ]
    [not(
            ancestor::*[text()[normalize-space()]]
            [
                ancestor-or-self::*[not(@ok)] 
                    or 
                ancestor-or-self::*[@ok][1][@ok='yes']
            ]
        )
    ]

Each of the outer predicate ([]) in the final xpath, in order, represents condition no 1, 2, and 3.

like image 183
har07 Avatar answered Jan 18 '23 10:01

har07