Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Which is the better approach to remove the redundant white space in XML [strip-space or indent="no"]?

Tags:

xml

xslt

I want to print my output xml in a single line[when viewed in notepad or other simple text-editor], so as to remove the redundant white-space in my xml file. So which is the better method to follow for that ??

I think there are two options,
1) To use

  <xsl:output method="xml" indent="no"/>

2) or to use

  <xsl:strip-space elements="*"/>

Which is more efficient, and why?
some people suggest me to use indent="no",

I believed that strip-space is best suited, but not sure because of suggestions given by others.

To be more elaborated let me take an example:
Input XML:

<root>
 <node>
   <child1/>
   <child2/>
 </node>
</root>

and the output required is:

<root><node><child1/><child2/></node></root>
like image 298
InfantPro'Aravind' Avatar asked Feb 26 '10 13:02

InfantPro'Aravind'


People also ask

How are white space handled by XML?

In XML documents, there are two types of whitespace: Significant whitespace is part of the document content and should be preserved. Insignificant whitespace is used when editing XML documents for readability. These whitespaces are typically not intended for inclusion in the delivery of the document.

Is indentation important in XML files?

Indenting XML tags is never required for parsing. However, the point of XML is to be human as well as machine readable, so indentation is generally worth having, as it makes it far easier for human readers to work with and take in the structure at a glance. It's not completely ignored.

How do you remove spaces in XSLT?

XSLT <xsl:strip-space> The <xsl:strip-space> element is used to define the elements for which white space should be removed. Note: Preserving white space is the default setting, so using the <xsl:preserve-space> element is only necessary if the <xsl:strip-space> element is used.


2 Answers

In order to eliminate anything that looks like "indentation" it may be necessary (that means there are cases when you need) to use both <xsl:strip-space> and ``indent="no"`.

Take the simplest example: you have the identity transformation. Without any of the two methods specified, the transformation will reproduce the white-space-only text nodes from the source XML document. That is, if the source XML document is indented, the transformation will produce indented result, too.

Now, add to this transformation <xsl:output indent="no" />. This instructs the XSLT processor not to perform "pretty-printing" of its own. However, the whitespace-only nodes from the source XML document are still copied to the output and the result document looks still indented (because the source document is indented).

Now, as a last step, add <xsl:strip-space elements="*"/>. You have specified both methods of preventing white-space-only nodes in the output. What happens? No white-space-only nodes are processed at all by the XSLT processor, and it does not indent the output -- you get your desired one-line dense output.

Finally, make a regression, change the <xsl:output indent="no" /> to <xsl:output indent="yes" />. The <xsl:strip-space elements="*"/> is still there, so no whitespace-only nodes are reproduced in the output. But the XSLT processor obeys the <xsl:output indent="yes" /> directive and adds whitespace-only text nodes of its own.

So, from the four possible combinations, only specifying both <xsl:strip-space elements="*"/> and <xsl:output indent="no" /> guarantees that no indentation will be caused either from whitespace-only nodes from the source XML document or from the XSLT processors initiative.

Even this last case, of course, doesn't completely guarantee that the output won't be indented -- if the XSLT programmer intentionally puts there indentation code such as

<xsl:text>

</xsl:text>

the output will contain this indentation.

like image 170
Dimitre Novatchev Avatar answered Oct 21 '22 11:10

Dimitre Novatchev


Perfomance differences are best measured. XSLT processor implementations differ, and you should make the test for yourself (though I suspect that worrying over the performance of the one or the other might fall into the "premature optimization" category in this case).

<xsl:output indent="no" /> might not have the effect you want unless accompanied by

<xsl:template match="text(normalize-space()='')" />

because if whitespace nodes (the ones between your tags) are not removed, then they will appear in the output at some point, regardless of the "output" setting.

like image 22
Tomalak Avatar answered Oct 21 '22 10:10

Tomalak