Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

xpath exclude element and all its children by parent attribute containing a value

Example of a markup:

<div class="post-content">
    <p>
        <moredepth>
            <...>
                <span class="image-container float_right">
                    <div class="some_element">
                        image1
                    </div>
                    <p>do not need this</p>
                </span>
                <div class="image-container float_right">
                    image2
                </div>
                <p>text1</p>
                <li>text2</li>
            </...>
        </moredepth>
    </p>
</div>

Worst part is that depth of "image-container" can be on any level.

Xpath I try to use:

//div[contains(@class, 'post-content')]//*[not(contains(@class, 'image-container'))]

What Xpath should I use to be able to exclude "some_element" and any other children of "image-container" of any depth and an "image-container" element itself?

Output in this example should be:

<p>
    <moredepth>
        <...>

            <p>text1</p>
            <li>text2</li>
        </...>
    </moredepth>
</p>

P.S. Is it possible to make such a selection using CSS?

like image 649
valerij vasilcenko Avatar asked Feb 26 '15 09:02

valerij vasilcenko


1 Answers

You can apply the Kaysian method for obtaining the intersection of a set. You have two sets:

A: The elements which descend from //div[contains(@class, 'post-content')], excluding the current element (since you don't want the root div):

//*[ancestor::div[contains(@class, 'post-content')]]

B: The elements which descend from //*[not(contains(@class, 'image-container'))], including the current element (since you want to exclude the entire tree, including the div and span):

//*[not(ancestor-or-self::*[contains(@class, 'image-container')])] 

The intersection of those two sets is the solution to your problem. The formula of the Kaysian method is: A [ count(. | B) = count(B) ]. Applying that to your problem, the result you need is:

//*[ancestor::div[contains(@class, 'post-content')]]
   [ count(. | //*[not(ancestor-or-self::*[contains(@class, 'image-container')])])
     = 
     count(//*[not(ancestor-or-self::*[contains(@class, 'image-container')])]) ]

This will select the following elements from your example code:

/div/p
/div/p/moredepth
/div/p/moredepth/...
/div/p/moredepth/.../p
/div/p/moredepth/.../li

excluding the span and the div that match the unwanted class, and its descendants.

You can then add extra steps to the expression to filter out exactly which text or nodes you need.

like image 159
helderdarocha Avatar answered Nov 15 '22 12:11

helderdarocha