Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Create HTML list from plain text within an XML structure

I have a XML file in which everything is well structured except for ordered lists. Every list item is tagged as a paragraph <p>, with the enumeration added manually: (1). I want to create a valid HTML list from that source.

Using the xsl:matching-substring method and regular expressions I was able to extract every list item but I can't seem to find a way to add the surrounding <ol> tags.

Here is an example:

XML source:

<Content>
    <P>(1) blah</P>
    <P>(2) blah</P>
    <P>(2) blah</P>
</Content>

What I have so far:

<xsl:variable name="text" select="/Content/*/text()"/>
<xsl:analyze-string select="$text" regex="(\(\d+\))([^(]*)">
    <xsl:matching-substring>    
        <![CDATA[<li>]]><xsl:value-of select="regex-group(2)"/><![CDATA[</li>]]>
    </xsl:matching-substring>
</xsl:analyze-string>

Output:

<li>blah</li>
<li>blah</li>
<li>blah</li>

In case you are wondering: output has to be plain text in general, only the contents of the $text variable have to be output in HTML. That's why I am using <![CDATA[]].

like image 923
machtwerk Avatar asked Jan 19 '26 06:01

machtwerk


1 Answers

As simple as this:

I. XSLT 2.0 solution:

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>

 <xsl:template match="/*">
  <ol>
    <xsl:apply-templates/>
  </ol>
 </xsl:template>

 <xsl:template match="P[matches(., '(^\(\d+\)\s*)(.*)')]">
    <li>
        <xsl:analyze-string select="." regex="(^\(\d+\)\s*)(.*)">
            <xsl:matching-substring>
              <xsl:value-of select="regex-group(2)"/>
            </xsl:matching-substring>
        </xsl:analyze-string>
    </li>
 </xsl:template>
</xsl:stylesheet>

When this transformation is applied on the provided XML document:

<Content>
    <P>(1) blah</P>
    <P>(2) blah</P>
    <P>(2) blah</P>
</Content>

the wanted, correct result is produced:

<ol>
    <li>blah</li>
    <li>blah</li>
    <li>blah</li>
</ol>

II. XSLT 1.0 solution:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <xsl:template match="/*">
  <ol>
    <xsl:apply-templates/>
  </ol>
 </xsl:template>

 <xsl:template match=
  "P[starts-with(.,'(')
   and
     floor(substring-before(substring(.,2), ')'))
    =
     substring-before(substring(.,2), ')')
    ]">
    <li>
         <xsl:value-of select="substring-after(., ') ')"/>
    </li>
 </xsl:template>
</xsl:stylesheet>

when this transformation is applied on the same XML document (above), the same correct result is produced:

<ol>
   <li>blah</li>
   <li>blah</li>
   <li>blah</li>
</ol>
like image 186
Dimitre Novatchev Avatar answered Jan 20 '26 22:01

Dimitre Novatchev