Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get xpaths for all leaf elements from XML?

Tags:

xml

xslt

xpath

I am wondering if is possible to create an XSLT stylesheet that would extract XPATHs for all leaf elements in a given XML file. E.g. for

<?xml version="1.0" encoding="UTF-8"?>
<root>
    <item1>value1</item1>
    <subitem>
        <item2>value2</item2>
    </subitem>
</root>

The output would be

/root/item1
/root/subitem/item2
like image 971
the_joric Avatar asked Jan 30 '12 13:01

the_joric


People also ask

What is the * indicates in XPath?

Then you'd select all element nodes with an @id -attribute-value equal to 'Passwd' in the whole document. Just add //* in the XPath -- it highlights --- various page elements. This would select all element nodes in the whole document.

What is XPath query in XML?

The XML Path Language (XPath) is used to uniquely identify or address parts of an XML document. An XPath expression can be used to search through an XML document, and extract information from any part of the document, such as an element or attribute (referred to as a node in XML) in it.


4 Answers

I think the following correction only matters in unusual cases where different prefixes are used for the same namespaces, or different namespaces for the same prefix, among sibling elements in a document. However there is nothing theoretically wrong with such input, and it could be common in certain kinds of generated XML.

Anyway, the following answer fixes that case (copied-and-modified from @Kirill's answer):

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

   <xsl:output method="text" indent="no" />

   <xsl:template match="*[not(*)]">
      <xsl:for-each select="ancestor-or-self::*">
         <xsl:value-of select="concat('/', name())"/>

         <!-- Suggestions on how to refactor the repetition of long XPath
              expression parts are welcome. -->
         <xsl:if test="count(../*[local-name() = local-name(current())
               and namespace-uri(.) = namespace-uri(current())]) > 1">
            <xsl:value-of select="concat('[', count(
               preceding-sibling::*[local-name() = local-name(current())
               and namespace-uri(.) = namespace-uri(current())]) + 1, ']')"/>
         </xsl:if>
      </xsl:for-each>
      <xsl:text>&#xA;</xsl:text>
      <xsl:apply-templates select="*"/>
   </xsl:template>

   <xsl:template match="*">
      <xsl:apply-templates select="*"/>
   </xsl:template>

</xsl:stylesheet>

It also addresses the problem in other answers where elements that are first in a series of siblings lack a position predicate.

E.g. for the input

<root>
   <item1>value1</item1>
   <subitem>
      <a:item xmlns:a="uri">value2</a:item>
      <b:item xmlns:b="uri">value3</b:item>
   </subitem>
</root>

this answer produces

/root/item1
/root/subitem/a:item[1]
/root/subitem/b:item[2]

which is correct.

However, like all XPath expressions, these will only work if the environment using them specifies correct bindings for the namespace prefixes used. In theory there can be more pathological documents for which the above answer generates XPath expressions that can never work (in XPath 1.0 at least) regardless of the prefix bindings. E.g. this input:

<root>
   <item1>value1</item1>
   <a:subitem xmlns:a="differentURI">
      <a:item xmlns:a="uri">value2</a:item>
      <b:item xmlns:b="uri">value3</b:item>
   </a:subitem>
</root>

produces the output

/root/item1
/root/a:subitem/a:item[1]
/root/a:subitem/b:item[2]

But the second XPath expression here can never work, since the prefix a refers to two different namespaces in the same expression.

like image 61
LarsH Avatar answered Oct 30 '22 18:10

LarsH


<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

    <xsl:output method="text" indent="no" />

    <xsl:template match="*[not(*)]">
        <xsl:for-each select="ancestor-or-self::*">
            <xsl:value-of select="concat('/', name())"/>

            <xsl:if test="count(preceding-sibling::*[name() = name(current())]) != 0">
                <xsl:value-of select="concat('[', count(preceding-sibling::*[name() = name(current())]) + 1, ']')"/>
            </xsl:if>
        </xsl:for-each>
        <xsl:text>&#xA;</xsl:text>
        <xsl:apply-templates select="*"/>
    </xsl:template>

    <xsl:template match="*">
        <xsl:apply-templates select="*"/>
    </xsl:template>

</xsl:stylesheet>

outputs:

/root/item1
/root/subitem/item2
like image 44
Kirill Polishchuk Avatar answered Oct 30 '22 19:10

Kirill Polishchuk


This transformation:

<xsl:stylesheet version="1.0"  xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
        <xsl:output omit-xml-declaration="yes" indent="yes"/>
        <xsl:strip-space elements="*"/>

        <xsl:variable name="vApos">'</xsl:variable>

        <xsl:template match="*[@* or not(*)] ">
          <xsl:if test="not(*)">
             <xsl:apply-templates select="ancestor-or-self::*" mode="path"/>
             <xsl:text>&#xA;</xsl:text>
            </xsl:if>
            <xsl:apply-templates select="@*|*"/>
        </xsl:template>

        <xsl:template match="*" mode="path">
            <xsl:value-of select="concat('/',name())"/>
            <xsl:variable name="vnumSiblings" select=
             "count(../*[name()=name(current())])"/>
            <xsl:if test="$vnumSiblings > 1">
                <xsl:value-of select=
                 "concat('[',
                         count(preceding-sibling::*
                                [name()=name(current())]) +1,
                         ']')"/>
            </xsl:if>
        </xsl:template>

        <xsl:template match="@*">
            <xsl:apply-templates select="../ancestor-or-self::*" mode="path"/>
            <xsl:value-of select="concat('[@',name(), '=',$vApos,.,$vApos,']')"/>
            <xsl:text>&#xA;</xsl:text>
        </xsl:template>
</xsl:stylesheet>

when applied on the provided XML document:

<root>
    <item1>value1</item1>
    <subitem>
        <item2>value2</item2>
    </subitem>
</root>

produces the wanted, correct result:

/root/item1
/root/subitem/item2

With this XML document:

<root>
    <item1>value1</item1>
    <subitem>
        <item>value2</item>
        <item>value3</item>
    </subitem>
</root>

it correctly produces:

/root/item1
/root/subitem/item[1]
/root/subitem/item[2]

See also this related answer: https://stackoverflow.com/a/4747858/36305

like image 33
Dimitre Novatchev Avatar answered Oct 30 '22 17:10

Dimitre Novatchev


Well you can find leaf elements with //*[not(*)] and of course you can for-each the ancestor-or-self axis then to output the path. But once you have namespaces involved generating XPath expressions becomes complicated.

like image 26
Martin Honnen Avatar answered Oct 30 '22 17:10

Martin Honnen