Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use two different analyze-strings for one node

Tags:

regex

xml

xslt

In my XSLT transformation I have two analyze-strings that I need to use to process one node. They work fine one by one, but I don't know how to put them together.

XML document looks like this:

<article>
    <title>Article 1</title>
    <text><![CDATA[Lorem ipsum dolor sit amet, s consectetur adipiscing elit. Donec lorem diam, eleifend sed mollis id, condimentum in velit.

Sed sit amet erat ac mauris adipiscing elementum. Pellentesque eget quam augue, id faucibus magna.

Ut malesuada arcu eu elit sodales sodales. Morbi tristique porttitor tristique. Praesent eget vulputate dui. Cras ut tortor massa, at faucibus ligula.]]></text>
</article>

Here's my XSLT:

<xsl:template match="/">
    <html>
        <head>
            <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
            <title>Page title</title>
        </head>
        <body>
            <xsl:for-each select="article">
                <h1><xsl:value-of select="./title"/></h1>

                <!-- This adds paragraphs tags instead of empty lines in the text -->
                <xsl:analyze-string select="./text" regex="&#xa;">
                    <xsl:non-matching-substring>
                        <p>
                            <xsl:value-of select="." disable-output-escaping="yes"/>
                        </p>
                    </xsl:non-matching-substring>
                </xsl:analyze-string> 

                <!-- This is Czech language specific. It looks for ' s ' (or other letter) and changes second space for &nbsp;. So after that it is ' s&nbsp;'. -->
                <xsl:analyze-string select="./text" regex="(\s[k/K/s/S/v/V/z/Z]\s)">
                    <xsl:matching-substring>
                        <xsl:text> </xsl:text>
                        <xsl:value-of select="replace(., ' ','')" disable-output-escaping="yes"/>
                        <xsl:text disable-output-escaping="yes"><![CDATA[&nbsp;]]></xsl:text>
                    </xsl:matching-substring>
                    <xsl:non-matching-substring>
                        <xsl:value-of select="." disable-output-escaping="yes"/>
                    </xsl:non-matching-substring>
                </xsl:analyze-string>
            </xsl:for-each>
        </body>
    </html>
</xsl:template>

I need to apply both analyze-strings on the generated text so there are <p> tags for paragraphs and also added &nbsp; on the right places.

My desired output would look like this:

<h1>Article 1</h1>    
<p>Lorem ipsum dolor sit amet, s&nbsp;consectetur adipiscing elit. Donec lorem diam, eleifend sed mollis id, condimentum in velit.</p>
<p>Sed sit amet erat ac mauris adipiscing elementum. Pellentesque eget quam augue, id faucibus magna.</p>
<p>Ut malesuada arcu eu elit sodales sodales. Morbi tristique porttitor tristique. Praesent eget vulputate dui. Cras ut tortor massa, at faucibus ligula.</p>

Any idea how to do this? Thank you for taking your time and trying to help me.

like image 611
johnnym26 Avatar asked May 20 '26 12:05

johnnym26


2 Answers

Here is my tweak on Dimitre's solution:

<xsl:stylesheet version="2.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output method="html" indent="yes" encoding="UTF-8"/>

 <xsl:template match="/*/text">
   <xsl:for-each select="tokenize( replace(., '\s([kKsSvVzZ])\s', ' $1&#xA0;'), '\n')">
     <p><xsl:value-of select="."/></p>
  </xsl:for-each>
 </xsl:template>

 <xsl:template match="title">
  <h1><xsl:value-of select="."/></h1>
 </xsl:template>
</xsl:stylesheet>

Notes

  1. I am not sure what you mean by "the letters s/S/v/V/k/K/z/Z". This is not valid regex. You need to clarify. I have taken a guess that you meant the character class [sSvVkKzZ]
  2. Although not clear, the reference to the Czech language suggests that UTF-8 might be a better choice for output encoding rather than ASCII.
  3. Although not clear, the expected output tags, suggest a more appropriate serialization would be html.
  4. As a side benefit of choosing html serialization, we no longer need the character map, making our solution simpler. We can leverage the in-built character map for html serialization.
  5. Use of fn:tokenise() obviates the need for xsl:analyze-string/xsl:non-matching-substring nodes, arguably resulting in a tighter solution.
  6. This solution was tested with Saxon.
  7. Variations are possible. For example you could move the replace() invocation to inside the xsl:value-of, which you may regard as more read-able.
  8. The disadvantage of my solution is that it does not work with disable-output-escaping="yes" . However I suggest that if you think you need this, please look again strongly at why. Any HTML needs HTML-safe encoding unless it is inside a CDATA section. There is something not right with the idea of generating HTML with disable-output-escaping turned on. Perhaps I have not fully understood the question. Could you give a Use Case which clarifies the point?
like image 152
Sean B. Durkin Avatar answered May 22 '26 02:05

Sean B. Durkin


This transformation:

<xsl:stylesheet version="2.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes" encoding="ascii"/>

 <xsl:template match="/*/text">
  <xsl:analyze-string select=
   "replace(., '\ss\s', ' s&#xA0;')"
   regex="&#xA;">
    <xsl:non-matching-substring>
     <p><xsl:sequence select="."/></p>
    </xsl:non-matching-substring>
  </xsl:analyze-string>
 </xsl:template>

 <xsl:template match="title">
  <h1><xsl:value-of select="."/></h1>
 </xsl:template>
</xsl:stylesheet>

When applied on the provided XML document:

<article>
  <title>Article 1</title>
<text><![CDATA[Lorem ipsum dolor sit amet, s consectetur adipiscing elit. Donec lorem diam, eleifend sed mollis id, condimentum in velit.
Sed sit amet erat ac mauris adipiscing elementum. Pellentesque eget quam augue, id faucibus magna.
Ut malesuada arcu eu elit sodales sodales. Morbi tristique porttitor tristique. Praesent eget vulputate dui. Cras ut tortor massa, at faucibus ligula.]]></text>
</article>

produces the wanted, correct result:

  <h1>Article 1</h1>
<p>Lorem ipsum dolor sit amet, s&#160;consectetur adipiscing elit. Donec lorem diam, eleifend sed mollis id, condimentum in velit.</p>
<p>Sed sit amet erat ac mauris adipiscing elementum. Pellentesque eget quam augue, id faucibus magna.</p>
<p>Ut malesuada arcu eu elit sodales sodales. Morbi tristique porttitor tristique. Praesent eget vulputate dui. Cras ut tortor massa, at faucibus ligula.</p>

Note: Programmers are discouraged to use DOE, as it is not a mandatory feature of XSLT 2.0 and there are no guarantees that any XSLT 2.0 processor might support DOE. The feature to use instead, is character maps.

Then the whole transformation becomes:

<xsl:stylesheet version="2.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"
  encoding="ascii" use-character-maps="nbsp"/>

 <xsl:character-map name="nbsp">
  <xsl:output-character
  character="&#xA0;" string="&amp;nbsp;"/>
 </xsl:character-map>

 <xsl:template match="/*/text">
  <xsl:analyze-string select=
   "replace(., '\ss\s', ' s&#xA0;')"
   regex="&#xA;">
    <xsl:non-matching-substring>
     <p><xsl:sequence select="."/></p>
    </xsl:non-matching-substring>
  </xsl:analyze-string>
 </xsl:template>

 <xsl:template match="title">
  <h1><xsl:value-of select="."/></h1>
 </xsl:template>
</xsl:stylesheet>

and when applied on the same XML document (above), it produces the wanted, correct result:

  <h1>Article 1</h1>
<p>Lorem ipsum dolor sit amet, s&nbsp;consectetur adipiscing elit. Donec lorem diam, eleifend sed mollis id, condimentum in velit.</p>
<p>Sed sit amet erat ac mauris adipiscing elementum. Pellentesque eget quam augue, id faucibus magna.</p>
<p>Ut malesuada arcu eu elit sodales sodales. Morbi tristique porttitor tristique. Praesent eget vulputate dui. Cras ut tortor massa, at faucibus ligula.</p>
like image 38
Dimitre Novatchev Avatar answered May 22 '26 00:05

Dimitre Novatchev



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!