Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

XPath 2.0 - select all nodes between 2 elements

I have the following XML file:

<document>
  <article>
    <head>headline 1</head>
    <text>
      <paragraph>foo</paragraph>
      <paragraph>bar</paragraph>
    </text>
    <date>
      <day>10</day>
      <month>05</month>
      <year>2002</year>
    </date>
    <source>some text</source>
    <portal>ABC</portal>
    <ID number="1"/>
  </article>
  <article>
    <head>headline 2</head>
    <text>
      <paragraph>lorem ipsum</paragraph>
    </text>
    <date>
      <day>10</day>
      <month>05</month>
      <year>2002</year>
    </date>
    <source>another source</source>
    <portal>DEF</portal>
    <ID number="2"/>
  </article>
</document>

Now I'd like to return all nodes of each article that occur after the head node and before the portal node. Therefore I was looking into XPath 2 node comparison (<< and >> operators).

What I have so far is the following, which returns empty:

<xsl:template match="/">
  <xsl:copy-of select="/document/article/head/following-sibling::*[. << ./article/portal]"/>
</xsl:template>

Any ideas how to fix that xpath query?

like image 905
mawo Avatar asked Nov 22 '25 01:11

mawo


2 Answers

Use:

/*/*/node()[. >> ../head and ../portal >> .]

Here is a complete transformation:

<xsl:stylesheet version="2.0"   xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output omit-xml-declaration="yes" indent="yes"/>

 <xsl:template match="/">
     <xsl:sequence select="/*/*/node()[. >> ../head and ../portal >> .]"/>
 </xsl:template>
</xsl:stylesheet>

When this transformation is applied on the provided XML document:

<document>
    <article>
        <head>headline 1</head>
        <text>
            <paragraph>foo</paragraph>
            <paragraph>bar</paragraph>
        </text>
        <date>
            <day>10</day>
            <month>05</month>
            <year>2002</year>
        </date>
        <source>some text</source>
        <portal>ABC</portal>
        <ID number="1"/>
    </article>
    <article>
        <head>headline 2</head>
        <text>
            <paragraph>lorem ipsum</paragraph>
        </text>
        <date>
            <day>10</day>
            <month>05</month>
            <year>2002</year>
        </date>
        <source>another source</source>
        <portal>DEF</portal>
        <ID number="2"/>
    </article>
</document>

the wanted, correct result is produced:

    <text>
        <paragraph>foo</paragraph>
        <paragraph>bar</paragraph>
    </text>
    <date>
        <day>10</day>
        <month>05</month>
        <year>2002</year>
    </date>
    <source>some text</source>

    <text>
        <paragraph>lorem ipsum</paragraph>
    </text>
    <date>
        <day>10</day>
        <month>05</month>
        <year>2002</year>
    </date>
    <source>another source</source>

Update:

In a comment Roman Pekar has specified a new requirement: he wants to get all such nodes that are between the first head and portal of each article.

Of course, this is straightforward -- just change the above expresssion to:

/*/*/node()[. >> ../head[1] and ../portal[1] >> .]
like image 196
Dimitre Novatchev Avatar answered Nov 23 '25 16:11

Dimitre Novatchev


A simple XPath 1.0 expression should work for such a case:

/document/article/head/following-sibling::*[following-sibling::portal]
like image 25
choroba Avatar answered Nov 23 '25 18:11

choroba



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!