Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to just get plain text and line breaks using XSL

Tags:

xslt

With this input

<?xml version="1.0" encoding="UTF-8"?> <data> 
This is a senstence   
this is another sentence

<section>
        <!--comment --><h2>my H2</h2>     <p>some paragraph</p>             <p>another paragraph</p>                 
    </section> </data>

I need to apply XSL style sheet to obtain just the plain text, honor the line breaks, and remove preceeding white space. So, after searching online for a few samples, I tried this, but it does not work for me. Sorry, Im not familiar with XSL and thought I'd ask.

Attempted XSL, but it does not work. Any ideas?

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="text" encoding="UTF-8"/>
    <xsl:strip-space elements="*" />

        <xsl:template match ="@* | node()">
            <xsl:copy>
                <xsl:apply-templates select="@* | node()"/>
            </xsl:copy>    
        </xsl:template>

        <xsl:template match="h1|h2">
            <xsl:text>
            </xsl:text>
            <xsl:copy>
                <xsl:apply-templates select="@* | node()"/>
            </xsl:copy>  
        </xsl:template>
</xsl:stylesheet>

This is the output after applying XSL. As you can see, its all one line, not carriage returns.

This is a sentence this is another sentence m H2some paragraphTanother paragraph

This is the output I'd like to get. Text within H1|H2|H3 should have a line break before and after.

This is a sentence 
this is another sentence 

my H2

some paragraph
another paragraph
like image 226
Jose Leon Avatar asked Oct 03 '22 07:10

Jose Leon


1 Answers

You need a xml:space="preserve" attribute to maintain the carriage return within xml:text, and you need a carriage return before and after the content of h1 and h2 tags:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output method="text" encoding="UTF-8"/>
  <xsl:strip-space elements="*" />

  <xsl:template match ="@* | node()">
    <xsl:copy>
      <xsl:apply-templates select="@* | node()"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="h1|h2">
    <xsl:text xml:space="preserve">
</xsl:text>
    <xsl:copy>
      <xsl:apply-templates select="@* | node()"/>
    </xsl:copy>
    <xsl:text xml:space="preserve">
</xsl:text>
  </xsl:template>
</xsl:stylesheet>

The initial text (This is a senstence, this is another sentence) is output correctly on separate lines in my case (using Visual Studio 2012 to execute the XSLT).

You write that only h tags should have the carriage return added - in your sample some paragraph and another paragraph are in p tags, so no carriage returns are added and they are output on the same line.

like image 57
MiMo Avatar answered Oct 07 '22 00:10

MiMo