I've got wads of autogenerated HTML doing stupid things like this:
<p>Hey it's <em>italic</em><em>italic</em>!</p>
And I'd like to mash that down to:
<p>Hey it's <em>italicitalic</em>!</p>
My first attempt was along these lines...
<xsl:template match="em/preceding::em">
<xsl:value-of select="$OPEN_EM"/>
<xsl:apply-templates/>
</xsl:template>
<xsl:template match="em/following::em">
<xsl:apply-templates/>
<xsl:value-of select="$CLOSE_EM"/>
</xsl:template>
But apparently the XSLT spec in its grandmotherly kindness forbids the use of the standard XPath preceding
or following
axes in template matchers. (And that would need some tweaking to handle three ems in a row anyway.)
Any solutions better than forgetting about doing this in XSLT and just running a replace('</em><em>', '')
in $LANGUAGE_OF_CHOICE on the end result? Rough requirements: should not combine two <em>
if they are separated by anything (whitespace, text, tags), and while it doesn't have to merge them, it should at least produce valid XML if there are three or more <em>
in a row. Handling tags nested within the ems (including other ems) is not required.
(And oh, I've seen how to merge element using xslt?, which is similar but not quite the same. XSLT 2 is regrettably not an option and the proposed solutions look hideously complex.)
This is also like grouping adjacents:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="node()|@*">
<xsl:copy>
<xsl:apply-templates select="node()[1]|@*"/>
</xsl:copy>
<xsl:apply-templates select="following-sibling::node()[1]"/>
</xsl:template>
<xsl:template match="em">
<em>
<xsl:call-template name="merge"/>
</em>
<xsl:apply-templates
select="following-sibling::node()[not(self::em)][1]"/>
</xsl:template>
<xsl:template match="node()" mode="merge"/>
<xsl:template match="em" name="merge" mode="merge" >
<xsl:apply-templates select="node()[1]"/>
<xsl:apply-templates select="following-sibling::node()[1]"
mode="merge"/>
</xsl:template>
</xsl:stylesheet>
Output:
<p>Hey it's <em>italicitalic</em>!</p>
Note: Fine graneid traversal identity rule (copy everything, node by node); em
rule (always the first, because the process is node by node), wraping and call merge
template, apply template to next sibling not em
; em
rule in merge
mode (also called merge
), aplly templates to first child (this case it's just a text node, but this allows nested elements) and then to next sibling in merge
mode; "break" rule, matching any thing not em
(because name test beats node type test in priority) stops the process.
This transformation:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:key name="kFollowing"
match="em[preceding-sibling::node()[1][self::em]]"
use="generate-id(preceding-sibling::node()[not(self::em)][1])"/>
<xsl:template match="node()|@*">
<xsl:copy>
<xsl:apply-templates select="node()|@*"/>
</xsl:copy>
</xsl:template>
<xsl:template match=
"em[following-sibling::node()[1][self::em]
and
not(preceding-sibling::node()[1][self::em])
]">
<em>
<xsl:apply-templates select=
"node()
|
key('kFollowing',
generate-id(preceding-sibling::node()[1])
)/node()"/>
</em>
</xsl:template>
<xsl:template match=
"em[preceding-sibling::node()[1][self::em]]"/>
</xsl:stylesheet>
when applied on the following XML document (based on the provided document, but with three adjacent em
elements):
<p>Hey it's <em>italic1</em><em>italic2</em><em>italic3</em>!</p>
produces the wanted, correct result:
<p>Hey it's <em>italic1italic2italic3</em>!</p>
Do note:
The use of the identity rule to copy every node as is.
The use of a key in order to specify conveniently the following adjacent em
elements.
The overriding of the identity transform only for em
elements that have adjacent em
elements.
This transformation merges any number of adjacent em
elements.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With