Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Indenting XML using XSL

Tags:

xslt

What XSL script will indent my data?

For example:

 <dtd name="cited">
 <XMLDOC>
 <cited year="2010">
 <case>
 No.&nbsp;275 v. M.N.R. 
 <cite>
 <yr>
 2010 
 <pno cite="20101188">10</pno> 
 </yr>
 </cite>
 </case>
 </cited>
 </XMLDOC>
 <XMLDOC>
 <case>
 Wellesley St.
 <cite>
 <yr>
 2010 
 <pno cite="20105133">9</pno> 
 </yr>
 </cite>
 </case>
 </XMLDOC>
 </dtd>

To:

<dtd name="cited">
  <XMLDOC>
    <cited year="2010"></cited>
    <case>
      No.&nbsp;275 v. M.N.R.
    </case> 
    <cite>
    </cite>
    <yr>
      2010 
    </yr>
    <pno cite="20101188">10</pno> 
  </XMLDOC>
  <XMLDOC>
    <case>
      Wellesley St 
    </case>
    <cite>
    </cite>
    <yr>
      2010 
    </yr>
    <pno cite="20105133">9</pno> 
  </XMLDOC>
</dtd>

Thank you!

Related

sgml to xml convertion

From comments:

what i want is to apply the correct closing tags like

<yr></yr>
<pno cite="20101188">10</pno>

instead of

<yr>
2010 
<pno cite="20101188">10</pno>
</yr>
like image 607
atif Avatar asked Dec 15 '10 20:12

atif


Video Answer


2 Answers

Use a simple identity transformation with indent="yes specified on the <xsl:output> declaration:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>

 <xsl:template match="node()|@*">
  <xsl:copy>
   <xsl:apply-templates select="node()|@*"/>
  </xsl:copy>
 </xsl:template>
</xsl:stylesheet>

This transformation, when applied on the provided XML document (the undefined entity &nbsp; replaced by its corresponding character entity &#xA0;):

 <dtd name="cited">
 <XMLDOC>
 <cited year="2010">
 <case>
 No.&#xA0;275 v. M.N.R.
 <cite>
 <yr>
 2010
 <pno cite="20101188">10</pno>
 </yr>
 </cite>
 </case>
 </cited>
 </XMLDOC>
 <XMLDOC>
 <case>
 Wellesley St.
 <cite>
 <yr>
 2010
 <pno cite="20105133">9</pno>
 </yr>
 </cite>
 </case>
 </XMLDOC>
 </dtd>

produces, when run with AltovaXML:

<dtd name="cited">
    <XMLDOC>
        <cited year="2010">
            <case>
 No. 275 v. M.N.R.
 <cite>
                    <yr>
 2010
 <pno cite="20101188">10</pno></yr>
                </cite></case>
        </cited>
    </XMLDOC>
    <XMLDOC>
        <case>
 Wellesley St.
 <cite>
                <yr>
 2010
 <pno cite="20105133">9</pno></yr>
            </cite></case>
    </XMLDOC>
</dtd>

The same transformation, when run with Saxon 6.5.4 produces:

<dtd name="cited">

   <XMLDOC>

      <cited year="2010">

         <case>
 No. 275 v. M.N.R.
 <cite>

               <yr>
 2010
 <pno cite="20101188">10</pno>

               </yr>

            </cite>

         </case>

      </cited>

   </XMLDOC>

   <XMLDOC>

      <case>
 Wellesley St.
 <cite>

            <yr>
 2010
 <pno cite="20105133">9</pno>

            </yr>

         </cite>

      </case>

   </XMLDOC>

</dtd>

So, the output is largely different, depending which XSLT 1.0 processor is used. Saxon parses and does not discard every whitespace-only node and this plus the indentation produces too much white space.

The workaround is to explicitly cause stripping of the whitespace-only nodes using:

<xsl:strip-space elements="*"/>

So, when this transformation:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <xsl:template match="node()|@*">
  <xsl:copy>
   <xsl:apply-templates select="node()|@*"/>
  </xsl:copy>
 </xsl:template>
</xsl:stylesheet>

is run with Saxon against the same source XML document, the output is now:

<dtd name="cited">
   <XMLDOC>
      <cited year="2010">
         <case>
 No. 275 v. M.N.R.
 <cite>
               <yr>
 2010
 <pno cite="20101188">10</pno>
               </yr>
            </cite>
         </case>
      </cited>
   </XMLDOC>
   <XMLDOC>
      <case>
 Wellesley St.
 <cite>
            <yr>
 2010
 <pno cite="20105133">9</pno>
            </yr>
         </cite>
      </case>
   </XMLDOC>
</dtd>

AltovaXML and a number of other XSLT 1.0 processors (.NET's XslCompiledTransform, XslTransform) also produces nice indented output running the last transformation.

UPDATE:

Just recently in his comments, the OP leaked out important new requirement, which makes this problem completely not just "indentation"...

From comments:

what i want is to apply the correct closing tags like

<yr></yr>  
<pno cite="20101188">10</pno>  

instead of

<yr>  
2010   
<pno cite="20101188">10</pno>  
</yr>

Here is the transformation, that produces the wanted output:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <xsl:template match="node()|@*">
  <xsl:copy>
   <xsl:apply-templates select="node()|@*"/>
  </xsl:copy>
 </xsl:template>

 <xsl:template match="yr">
  <yr>
    <xsl:apply-templates select="text()[1]"/>
  </yr>
  <xsl:apply-templates select="*"/>
 </xsl:template>
</xsl:stylesheet>
like image 56
Dimitre Novatchev Avatar answered Sep 29 '22 18:09

Dimitre Novatchev


<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<!-- output xml and indent -->
<xsl:output method="xml" indent="yes"/>
<!-- copy all elements and their attributes -->
<xsl:template match="* | @*">
<xsl:copy><xsl:copy-of select="@*"/><xsl:apply-templates/></xsl:copy>
</xsl:template>
</xsl:stylesheet>
like image 24
dacracot Avatar answered Sep 29 '22 19:09

dacracot