Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

XSL transform for nested html tags

Tags:

html

xml

xslt

I have a series of documents output by a Java application that exports XML with html tags unescaped for example as

<b>some text</b>

( I cannot change this behaviour).

The app that then uses this output must have all html tags escaped to

&lt;b&gt;some text &lt;/b&gt;

I use the xslt below to escape the tags but not surprisingly it does not work for nested html tags, for example where there's

<u><b>A string of html</b></u>

Upon XSLT transform I get

&lt;u&gt;a string of html&lt;/u&gt; 

where nested <b> and </b> tags get removed altogether.

I am looking to achieve

&lt;u&gt;&lt;b&gt;A string of html&lt;/b&gt;&lt;/u&gt;

I am sure there's an easy answer to this by adjusting the value-of select or the template but I have tried and failed dismally

Any help would be much appreciated!

Sample doc with embedded html tags

<?xml version="1.0" encoding="UTF-8"?>
<Main>
<Text><u><b>A string of html</b></u></Text>
</Main>

This is the XSLT

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes" omit-xml-declaration="no" encoding="UTF-8"/>
<xsl:strip-space elements="*" />  

<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>

<xsl:template match="Text/*">
  <xsl:value-of select="concat('&lt;',name(),'&gt;',.,'&lt;/',name(),'&gt;')" />
</xsl:template>

</xsl:stylesheet>

Which produces

<?xml version="1.0" encoding="UTF-8"?>
<Main>
  <Text>&lt;u&gt;A string of html&lt;/u&gt;</Text>
</Main>

The inner bold tags have been dropped as you can see.

Can anyone help with adjusting the xslt?

Thank you :-)

like image 604
user3012857 Avatar asked Nov 20 '13 12:11

user3012857


1 Answers

Try changing your current Text/* template to this

<xsl:template match="Text//*">
  <xsl:value-of select="concat('&lt;',name(),'&gt;')" />
  <xsl:apply-templates />
  <xsl:value-of select="concat('&lt;/',name(),'&gt;')" />
</xsl:template>

So, the Text//* will match any descendant element of the Text element, not just the immediate child. You then output the opening and closing templates separately, and in between these you recursively call the template to process the 'nested' elements.

When applied to your sample XML, the following should be output

<Main>
  <Text>&lt;u&gt;&lt;b&gt;A string of html&lt;/b&gt;&lt;/u&gt;</Text>
</Main>
like image 144
Tim C Avatar answered Sep 18 '22 11:09

Tim C