Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

XHTML. Enclose <div> text in paragraphs and convert <br/> to paragraphs with XSLT 1.0

Tags:

xhtml

xslt

I'm looking for a quick and easy way to convert his XML (which is like XHTML) with XSLT 1.0:

<?xml version="1.0" encoding="UTF-8"?>
<html>
  <head/>
  <body>
    <div>Hello<a href="http://google.com">this is the first</a>line.<p>This the second.<br/>And this the third one.</p></div>
  </body>
 </html>

to this one:

<?xml version="1.0" encoding="UTF-8"?>
<html>
  <head/>
  <body>
    <div>
        <p>Hello<a href="http://google.com">this is the first</a>line.</p>
        <p>This the second.</p>
        <p>And this the third one.</p>
    </div>
  </body>
 </html>

I was thinking of a tree-walk algorithm in XSLT 1.0. What is complicated are e.g. the enclosed <a>links. And also existing <p> should not be removed.

May somebody help me with this? Thanks a lot.

like image 349
therealmarv Avatar asked Aug 09 '11 11:08

therealmarv


People also ask

What is text () in XSLT?

The <xsl:text> element is used to write literal text to the output. Tip: This element may contain literal text, entity references, and #PCDATA.

What is Number () in XSLT?

Specifies the format pattern. Here are some of the characters used in the formatting pattern: 0 (Digit) # (Digit, zero shows as absent)

Is XSLT 2.0 backward compatibility?

The XSLT 2.0 engine is backwards compatible. The only time the backwards compatibility of the XSLT 2.0 engine comes into effect is when using the XSLT 2.0 engine to process an XSLT 1.0 stylesheet.

What is XSLT format?

The Extensible Stylesheet Language Transformation (XSLT) standard specifies a language definition for XML data transformations. XSLT is used to transform XML documents into XHTML documents, or into other XML documents.


1 Answers

This transformation:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output method="xml" omit-xml-declaration="yes" indent="yes" />
 <xsl:strip-space elements="*"/>

 <xsl:template match="node()|@*">
     <xsl:copy>
       <xsl:apply-templates select="node()|@*"/>
     </xsl:copy>
 </xsl:template>

 <xsl:template match="div[text() and p]">
  <div>
   <p>
     <xsl:apply-templates select="node()[not(self::p or preceding-sibling::p)]"/>
   </p>
   <xsl:apply-templates select="p | p/following-sibling::node()"/>
  </div>
 </xsl:template>

 <xsl:template match="p[text() and br]">
  <xsl:apply-templates/>
 </xsl:template>

 <xsl:template match=
  "p/text()
    [preceding-sibling::node()[1][self::br]
    or
     following-sibling::node()[1][self::br]
    ]">
  <p><xsl:value-of select="."/></p>
 </xsl:template>

 <xsl:template match="p/br"/>
</xsl:stylesheet>

when applied on the provided XML document:

<html>
    <head/>
    <body>
        <div>Hello
            <a href="http://google.com">this is the first</a>line.
            <p>This the second.<br/>And this the third one.</p>
        </div>
    </body>
</html>

produces the wanted, correct result:

<html>
   <head/>
   <body>
      <div>
         <p>Hello
            <a href="http://google.com">this is the first</a>line.
            </p>
         <p>This the second.</p>
         <p>And this the third one.</p>
      </div>
   </body>
</html>
like image 119
Dimitre Novatchev Avatar answered Sep 23 '22 23:09

Dimitre Novatchev