Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Whitespace stripping with XslCompiledTransform

I'm trying to migrate a large app from XslTransform to compiled xsl files and XslCompiledTransform.

The app uses the Xsl to create HTML files, and the transformation data (Xml) was passed to the Xsl with a XmlDataDocument, returned from the database.

I've change all that so now I do (at least temporarily):

C#

 public string ProcessCompiledXsl(XmlDataDocument xml)
 {
       StringBuilder stringControl = new StringBuilder();
       XslCompiledTransform xslTran = new XslCompiledTransform();

       xslTran.Load(
           System.Reflection.Assembly.Load("CompiledXsl").GetType(dllName)
       );

       xslTran.Transform(xml, this.Arguments, XmlWriter.Create(stringControl, othersettings), null);

       return stringControl.ToString();
 }

XSL (just an example)

...
  <xsl:output method="html" indent="yes"/>
  <xsl:template match="/">
       <xsl:for-each select="//Object/Table">
              <a href="#">
                     some text
              </a>
       </xsl:for-each>
  </xsl:template>

Problem

That works, but the xsl is stripping the whitespaces between the tags outputting:

<a href="#">
   some text
</a><a href="#">
   some text
</a><a href="#">
   some text
</a><a...etc

I've tried:

  • Using xml:space="preserve" but I couldn't get it to work
  • Overriding the OutputSettings, but I didn't get any good results (maybe I missed something)
  • Using an xsl:output method="xml", and that works, but creates self closing tags and a lot of other problems

So I don't know what to do. Maybe I'm not doing something right.Any help it's really appreciated.

Thanks!

EDIT

Just for future references, if you want to tackle this problem leaving every XSL intact, one could try this C# class I wrote, named CustomHtmlWriter.

Basically what I did is extend from XmlTextWriter and modify the methods that write the start and the end of every tag.

In this particular case, you would use it like this:

    StringBuilder sb = new StringBuilder();
    CustomHtmlWriter writer = new CustomHtmlWriter(sb);

    xslTran.Transform(nodeReader, this.Arguments, writer);

    return sb.ToString();

Hope it helps someone.

like image 740
nicosantangelo Avatar asked Aug 31 '12 15:08

nicosantangelo


People also ask

How to remove white space in XSLT?

By default, XSLT templates have <xsl:preserve-space> set, which will keep whitespace in your output. You can add <xsl:strip-space elements="*"> to tell it to where to delete whitespace.

What is xsl Strip space?

Used at the top level of the stylesheet to define elements in the source document for which whitespace nodes are insignificant and should be removed from the tree before processing.


4 Answers

I. Solution 1:

Let me first analyze the problem here:

Given this source XML document (invented, as you haven't provided any):

<Object>
 <Table>

 </Table>

 <Table>

 </Table>

 <Table>

 </Table>

 <Table>

 </Table>
</Object>

This transformation:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output method="html" indent="yes"/>

  <xsl:template match="/">
       <xsl:for-each select="//Object/Table">
              <a href="#">
                     some text
              </a>
       </xsl:for-each>
  </xsl:template>
<!--
 <xsl:template match="Table">
   <a href="#">
    Table here
   </a>
 </xsl:template>
 -->
</xsl:stylesheet>

exactly reproduces the problem -- the result is:

<a href="#">
                     some text
              </a><a href="#">
                     some text
              </a><a href="#">
                     some text
              </a><a href="#">
                     some text
              </a>

Now, just uncomment the commented template and comment out the first template:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output method="html" indent="yes"/>
<!--
  <xsl:template match="/">
       <xsl:for-each select="//Object/Table">
              <a href="#">
                     some text
              </a>
       </xsl:for-each>
  </xsl:template>
 -->
 <xsl:template match="Table">
   <a href="#">
    Table here
   </a>
 </xsl:template>
</xsl:stylesheet>

The result has the wanted indentation:

 <a href="#">
    Table here
   </a>

 <a href="#">
    Table here
   </a>

 <a href="#">
    Table here
   </a>

 <a href="#">
    Table here
   </a>

And this was solution 1


II. Solution 2:

This solution may reduce to minimum the required modifications to your existing XSLT code:

This is a two-pass transformation:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
 xmlns:ext="urn:schemas-microsoft-com:xslt" exclude-result-prefixes="ext">
 <xsl:output method="html"/>

  <xsl:template match="/">
    <xsl:variable name="vrtfPass1">
       <xsl:for-each select="//Object/Table">
              <a href="#">
                     some text
              </a>
       </xsl:for-each>
    </xsl:variable>

    <xsl:apply-templates select=
        "ext:node-set($vrtfPass1)" mode="pass2"/>
  </xsl:template>

 <xsl:template match="node()|@*" mode="pass2">
  <xsl:copy>
   <xsl:apply-templates select="node()|@*" mode="pass2"/>
  </xsl:copy>
 </xsl:template>

  <xsl:template mode="pass2" match="*[preceding-sibling::node()[1][self::*]]">
   <xsl:text>&#xA;</xsl:text>
   <xsl:copy-of select="."/>
  </xsl:template>
</xsl:stylesheet>

The idea is that we don't even touch the existing code, but capture its output and using a few lines of additional code only, we format the output to have the wanted, final appearance.

When this transformation is applied on the same XML document, the same, wanted result is produced:

<a href="#">
                     some text
              </a>
<a href="#">
                     some text
              </a>
<a href="#">
                     some text
              </a>
<a href="#">
                     some text
              </a>

Finally, here is a demonstration how this minor change can be introduced, without touching at all any existing XSLT code:

Let's have this existing code in c:\temp\delete\existing.xsl:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output method="html"/>

  <xsl:template match="/">
    <xsl:for-each select="//Object/Table">
      <a href="#">
        some text
      </a>
    </xsl:for-each>
  </xsl:template>
</xsl:stylesheet>

If we run this we get the problematic output.

Now, instead of running existing.xsl, we run this transformation:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
 xmlns:ext="urn:schemas-microsoft-com:xslt" exclude-result-prefixes="ext">
 <xsl:import href="file:///c:/temp/delete/existing.xsl"/>
 <xsl:output method="html"/>


  <xsl:template match="/">
    <xsl:variable name="vrtfPass1">
       <xsl:apply-imports/>
    </xsl:variable>

    <xsl:apply-templates select=
        "ext:node-set($vrtfPass1)" mode="pass2"/>
  </xsl:template>

 <xsl:template match="node()|@*" mode="pass2">
  <xsl:copy>
   <xsl:apply-templates select="node()|@*" mode="pass2"/>
  </xsl:copy>
 </xsl:template>

  <xsl:template mode="pass2" match="*[preceding-sibling::node()[1][self::*]]">
   <xsl:text>&#xA;</xsl:text>
   <xsl:copy-of select="."/>
  </xsl:template>
</xsl:stylesheet>

The result is the wanted one and the existing code is untouched at all:

<a href="#">
        some text
      </a>
<a href="#">
        some text
      </a>
<a href="#">
        some text
      </a>
<a href="#">
        some text
      </a>

Explanation:

  1. We import any existing code that is at the top level of the import-precedence hierarchy (not imported by other stylesheets), using xsl:import.

  2. We capture the output of the existing transformation in a variable. It has the infamous RTF (Result Tree Fragment) that needs to be converted to regular tree to be processed further.

  3. The key moment is performing xsl:apply-imports when capturing the output of the transformation. This ensures that any template from the existing code (even one that we override -- such as the template matching /) will be selected for execution as in the case when the existing transformation is performed by itself).

  4. We convert the RTF into a regular tree using the msxsl:node-set() extension function (XslCompiledTransform also supports the EXSLT node-set() extension function).

  5. We perform our cosmetic adjustments on the so produced regular tree.

Do Note:

This represents a general algorithm for post-processing existing transformations without touching the existing code.

like image 87
Dimitre Novatchev Avatar answered Sep 27 '22 22:09

Dimitre Novatchev


I think the problem is:

  <xsl:output method="html" indent="yes"/> 

If I remember correctly html tries to only care about whitespace which is important to how the HTML will be displayed.

If you try:

  <xsl:output method="xml" indent="yes"/> 

Then it should create the indented whitespace you expect.

like image 35
Nick Jones Avatar answered Sep 27 '22 22:09

Nick Jones


Whitespace text nodes in the stylesheet are always ignored, unless they are contained in xsl:text. If you want to output whitespace to the result tree, use xsl:text.

(It's also possible to use xml:space="preserve" in the stylesheet, but it's generally not advisable as it has unwanted side-effects.)

like image 44
Michael Kay Avatar answered Sep 28 '22 00:09

Michael Kay


I don't remember the details of XML/XSLT space preservation off the top of my head, but one instance where it's more likely to discard whitespace is between elements where there is no non-whitespace text (i.e. whitespace-only text nodes, like the one between </a> and </xsl:for-each>). You can prevent this by using the <xsl:text> element.

For example, after

          <a href="#">
                 some text
          </a>

put

          <xsl:text>&#10;</xsl:text>

I.e. a literal line end character.

Does that meet your requirements?

like image 28
LarsH Avatar answered Sep 27 '22 23:09

LarsH