Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

XSLT custom function returning nodeset or XML fragment (not simple datatype)

Tags:

java

xslt

I am trying to develop an XSLT custom function that could return node set or an XML fragment, let's say something like:

Input document:

<root>
<!--
 author: blablabla
 usage: more blablabla
 labelC: [in=2] <b>formatted</b> blablabla
-->
<tag1 name="first">
    <tag2>content a</tag2>
    <tag2>content b</tag2>
    <tag3 attrib="val">content c</tag3>
</tag1>

<!--
 author: blebleble
 usage: more blebleble
 labelC: blebleble
-->
<tag1 name="second">
    <tag2>content x</tag2>
    <tag2>content y</tag2>
    <tag3 attrib="val">content z</tag3>
</tag1>
</root>

So that an XSLT template such as:

    <xsl:template match="//tag1/preceding::comment()[1]" xmlns:d="java:com.dummy.func">
    <section>
     <para>
      <xsl:value-of select="d:genDoc(.)"/>
     </para>
    </section>
    </xsl:template>

Would produce:

    <section>
     <para>
      <author>blablabla</author>
      <usage>more blablabla</usage>
      <labelC in="2"><b>formatted</b> blablabla</labelC>
     </para>
    </section>

When matched on the first occurrence of tag1 and

    <section>
     <para>
      <author>blebleble</author>
      <usage>more blebleble</usage>
      <labelC>blebleble</labelC>
     </para>
    </section>

When matched on the second occurrence.

Basically what I want to achieve with this custom function is to parse some meta-data present in the comments and use it to generate XML.

I found some examples online, one at: http://cafeconleche.org/books/xmljava/chapters/ch17s03.html

According to the example, my function should return one of the following

org.w3c.dom.traversal.NodeIterator,
org.apache.xml.dtm.DTM,
org.apache.xml.dtm.DTMAxisIterator,
org.apache.xml.dtm.DTMIterator,
org.w3c.dom.Node and its subtypes (Element, Attr, etc),
org.w3c.dom.DocumentFragment

I was able to implement a function returning the XML as simple type String. This, however poses several other problems: the main being the markers characters get escaped when inserted in the original XML.

Does anybody have an example of how to implement such function? I am mostly interested in how to return a proper XML node set to the calling template.

like image 464
Daniele Avatar asked Nov 04 '22 08:11

Daniele


1 Answers

The below may get you a long way along the road you want to go. Note that this requires XSLT 2.0 version (in XSLT 1.0 it will be possible too, when supplying a replacement function for tokenize). Also note that this assumes a specific comment contents structure.
Explanation: comments are first split up into rows (delimiter & #xD; which is a line-feed), then in tag+value (delimiter ":", splitting into author, usage, labelC, the order is not important here), then in attributes and value for labelC (delimiter "] ", recognizing attributes as starting with "[").
Note that a lot of whitespace-wiping is done using normalize-space().

Edited: xslt version with function see at the bottom

XSLT

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>

    <xsl:template match="/">
        <output>
            <xsl:apply-templates/>
        </output>
    </xsl:template>

    <xsl:template match="tag1/*">
    </xsl:template>

    <xsl:template match="comment()">
        <section>
            <para>
                <xsl:for-each select="tokenize(., '&#xD;')[string-length() != 0]">
                    <xsl:variable name="splitup" select="tokenize(normalize-space(current()), ':')"/>
                    <xsl:choose>
                        <xsl:when test="$splitup[1]='author'">
                            <author><xsl:value-of select="normalize-space($splitup[2])"/></author>
                        </xsl:when>
                        <xsl:when test="$splitup[1]='usage'">
                            <usage><xsl:value-of select="normalize-space($splitup[2])"/></usage>
                        </xsl:when>
                        <xsl:when test="$splitup[1]='labelC'">
                            <labelC>
                                <xsl:for-each select="tokenize($splitup[2], '] ')[string-length() != 0]">
                                    <xsl:variable name="labelCpart" select="normalize-space(current())"/>
                                    <xsl:choose>
                                        <xsl:when test="substring($labelCpart, 1,1) = '['">
                                            <xsl:variable name="attr" select="tokenize(substring($labelCpart, 2), '=')"/>
                                            <xsl:attribute name="{$attr[1]}"><xsl:value-of select="$attr[2]"/></xsl:attribute>
                                        </xsl:when>
                                        <xsl:otherwise>
                                            <xsl:value-of select="$labelCpart"/>
                                        </xsl:otherwise>
                                    </xsl:choose>
                                </xsl:for-each>
                            </labelC>
                        </xsl:when>
                    </xsl:choose>
                </xsl:for-each>
            </para>
        </section>
    </xsl:template>

</xsl:stylesheet>

when applied to the following XML

<?xml version="1.0" encoding="UTF-8"?>
<root>
<!--
 author: blablabla
 usage: more blablabla
 labelC: [in=2] <b>formatted</b> blablabla
-->
<tag1 name="first">
    <tag2>content a</tag2>
    <tag2>content b</tag2>
    <tag3 attrib="val">content c</tag3>
</tag1>

<!--
 author: blebleble
 usage: more blebleble
 labelC: blebleble
-->
<tag1 name="second">
    <tag2>content x</tag2>
    <tag2>content y</tag2>
    <tag3 attrib="val">content z</tag3>
</tag1>
</root>

gives the following output

<?xml version="1.0" encoding="UTF-8"?>
<output>
    <section>
        <para>
            <author>blablabla</author>
            <usage>more blablabla</usage>
            <labelC in="2">&lt;b&gt;formatted&lt;/b&gt; blablabla</labelC>
        </para>
    </section>
    <section>
        <para>
            <author>blebleble</author>
            <usage>more blebleble</usage>
            <labelC>blebleble</labelC>
        </para>
    </section>
</output>

EDITED xslt with function call (gives the same output)

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:d="java:com.dummy.func"
exclude-result-prefixes="d">

    <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>

    <xsl:template match="/">
        <output>
            <xsl:apply-templates/>
        </output>
    </xsl:template>

    <xsl:template match="tag1/*">
    </xsl:template>

    <xsl:function name="d:section">
        <xsl:param name="comm"/>
        <section>
            <para>
                <xsl:for-each select="tokenize($comm, '&#xD;')[string-length() != 0]">
                    <xsl:variable name="splitup" select="tokenize(normalize-space(current()), ':')"/>
                    <xsl:choose>
                        <xsl:when test="$splitup[1]='author'">
                            <author><xsl:value-of select="normalize-space($splitup[2])"/></author>
                        </xsl:when>
                        <xsl:when test="$splitup[1]='usage'">
                            <usage><xsl:value-of select="normalize-space($splitup[2])"/></usage>
                        </xsl:when>
                        <xsl:when test="$splitup[1]='labelC'">
                            <labelC>
                                <xsl:for-each select="tokenize($splitup[2], '] ')[string-length() != 0]">
                                    <xsl:variable name="labelCpart" select="normalize-space(current())"/>
                                    <xsl:choose>
                                        <xsl:when test="substring($labelCpart, 1,1) = '['">
                                            <xsl:variable name="attr" select="tokenize(substring($labelCpart, 2), '=')"/>
                                            <xsl:attribute name="{$attr[1]}"><xsl:value-of select="$attr[2]"/></xsl:attribute>
                                        </xsl:when>
                                        <xsl:otherwise>
                                            <xsl:value-of select="$labelCpart"/>
                                        </xsl:otherwise>
                                    </xsl:choose>
                                </xsl:for-each>
                            </labelC>
                        </xsl:when>
                    </xsl:choose>
                </xsl:for-each>
            </para>
        </section>
    </xsl:function>

    <xsl:template match="comment()">
        <xsl:copy-of select="d:section(.)"/>
    </xsl:template>

</xsl:stylesheet>
like image 125
Maestro13 Avatar answered Nov 10 '22 19:11

Maestro13