I have the following xml
<xml>
<para>
<number>1</number>
<text> Paragraph 1(<italic>A</italic>) is this para.</text>
</para>
</xml>
I want to match the text element if i found a pattern starting with word Paragraph followed by space followed by one or more digit followed by "(" followed by node italic and digit and closing ")". Then it should put a anchor tag around it. so output of above xml should be
<xml>
<para>
<number>1</number>
<text> <a href="Paragraph1(A)">Paragraph 1(<italic>A</italic>)</a> is this para.</text>
</para>
</xml>
i.e replace Paragraph 1(<italic>A</italic>) with a tag and href value should be matched text without any spaces and italic node.
Any help or hint how to handle in regex...
This XSLT 2.0 stylesheet produces the desired result:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
<xsl:output omit-xml-declaration="no" indent="yes"/>
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
<!-- Only our text element requires special handling here....-->
<xsl:template match="text[matches(.,'Paragraph\s+\d*')]">
<xsl:copy>
<xsl:variable name="textElement" select="."/>
<xsl:analyze-string select="." regex="(Paragraph\s+\d*)(\(.*\))">
<xsl:matching-substring>
<a href="{concat(replace(regex-group(1),'\s',''),regex-group(2))}">
<xsl:apply-templates select="$textElement/node()"/>
</a>
</xsl:matching-substring>
</xsl:analyze-string>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
This can give you an idea on how you could solve it:
<?xml version="1.0"?>
<xsl:template match="/">
<xsl:apply-templates/>
</xsl:template>
<!-- Only our text element requires special handling here....-->
<xsl:template match="text">
<xsl:copy>
<xsl:choose>
<xsl:when test="matches(.,'Paragraph\s+\d*')">
<!-- Save original text value here -->
<xsl:variable name="temp" select="."/>
<!-- Save the value of <italic>x</italic> child element -->
<xsl:variable name="italic_val" select="italic/text()"/>
<xsl:analyze-string select="." regex="(Paragraph\s+\d*)">
<xsl:matching-substring>
<xsl:element name="a">
<xsl:attribute name="href">
<xsl:value-of select="concat(replace(regex-group(1),'\s',''),'(',$italic_val,')')"/>
</xsl:attribute>
<xsl:value-of select="$temp"/>
</xsl:element>
</xsl:matching-substring>
</xsl:analyze-string>
</xsl:when>
<xsl:otherwise>DOESNT MATCH</xsl:otherwise>
</xsl:choose>
</xsl:copy>
</xsl:template>
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
It basically uses the XSLT identity template to copy the original doc and defines a template to handle <text> element. There it analyzes its Text() content and for the appropriate Regex: Paragraph . If it finds that it generates the anchor sub-structure. For that I use some temporary variables.
Here my output file:
<xml>
<para>
<number>1</number>
<text><a href="Paragraph1(A)"> Paragraph 1(A) is this para.</a></text>
</para>
</xml>
I'm still missing the Paragraph 1(<italic>A</italic>) instead of what I'm getting: Paragraph 1(A) but that's just some tweaking...
Take a look at this link It may help you understand Regex in XSLT
Notice it uses XSLT 2.0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With