I am extracting data from XML using XSLT 2.0. The data has long lines and I want to fit them into window size by automatically breaking lines.
Is it possible in XSLT?
The <xsl:text> element is used to write literal text to the output. Tip: This element may contain literal text, entity references, and #PCDATA.
Returns the contents of the current group selected by xsl:for-each-group. Available in XSLT 2.0 and later versions. Available in all Saxon editions. current-group() ➔ item()*
Incorporating <STYLE> Elements into an XSLT FileAn XSLT style sheet can emit HTML <STYLE> elements, including CSS specifications, directly into the HTML that results from the XSLT transformation. This option works best when the number of CSS rules is small and easily managed.
XSL Transformation (XSLT)XSLT is designed to be used as part of XSL. In addition to XSLT, XSL includes an XML vocabulary for specifying formatting. XSL specifies the styling of an XML document by using XSLT to describe how the document is transformed into another XML document that uses the formatting vocabulary.
You can use the standard XSLT 2.0 function unparsed-text()
to read a text file directly in your XSLT 2.0 code.
Then just use:
replace(concat(normalize-space($text),' '),
'(.{0,60}) ',
'$1
')
Explanation:
This first normalizes the white space, deleting the leading and trailing sequences of whitespace-only characters and replacing any inner such sequence with a single space.
Then the result of the normalization is used as the first argument to the standard XPath 2.0 function replace()
.
The match pattern is any (longest possible sequence of maximum 61 characters that ends with a space.
The replacement argument specifies that any such sequence found should be replaced by the string before the ending space, concatenated with a NL character.
Here is a complete solution, reading and formatting this text from the file C:\temp\delete\text.txt
:
Dec. 13 — As always for a presidential inaugural, security and surveillance were
extremely tight in Washington, DC, last January. But as George W. Bush prepared to
take the oath of office, security planners installed an extra layer of protection: a
prototype software system to detect a biological attack. The U.S. Department of
Defense, together with regional health and emergency-planning agencies, distributed
a special patient-query sheet to military clinics, civilian hospitals and even aid
stations along the parade route and at the inaugural balls. Software quickly
analyzed complaints of seven key symptoms — from rashes to sore throats — for
patterns that might indicate the early stages of a bio-attack. There was a brief
scare: the system noticed a surge in flulike symptoms at military clinics.
Thankfully, tests confirmed it was just that — the flu.
The XSLT code:
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xsl:output method="text"/>
<xsl:variable name="vText" select=
"unparsed-text('file:///c:/temp/delete/text.txt')"/>
<xsl:template match="/">
<xsl:sequence select=
"replace(concat(normalize-space($vText),' '),
'(.{0,60}) ',
'$1
')
"/>
</xsl:template>
</xsl:stylesheet>
The result is a set of lines, each of which doesn't exceed a fixed length of 60:
Dec. 13 — As always for a presidential inaugural, security
and surveillance were extremely tight in Washington, DC,
last January. But as George W. Bush prepared to take the
oath of office, security planners installed an extra layer
of protection: a prototype software system to detect a
biological attack. The U.S. Department of Defense, together
with regional health and emergency-planning agencies,
distributed a special patient-query sheet to military
clinics, civilian hospitals and even aid stations along the
parade route and at the inaugural balls. Software quickly
analyzed complaints of seven key symptoms — from rashes to
sore throats — for patterns that might indicate the early
stages of a bio-attack. There was a brief scare: the system
noticed a surge in flulike symptoms at military clinics.
Thankfully, tests confirmed it was just that — the flu.
Update:
In case the text comes from an XML file, this can be done with a minimal change to the above solution:
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xsl:output method="text"/>
<xsl:template match="/">
<xsl:sequence select=
"replace(concat(normalize-space(text),' '),
'(.{0,60}) ',
'$1
')
"/>
</xsl:template>
</xsl:stylesheet>
Here I suppose that all the text is in the only text node child of the top element (named text
) of the XML document:
<text>
Dec. 13 — As always for a presidential inaugural, security and surveillance were
extremely tight in Washington, DC, last January. But as George W. Bush prepared to
take the oath of office, security planners installed an extra layer of protection: a
prototype software system to detect a biological attack. The U.S. Department of
Defense, together with regional health and emergency-planning agencies, distributed
a special patient-query sheet to military clinics, civilian hospitals and even aid
stations along the parade route and at the inaugural balls. Software quickly
analyzed complaints of seven key symptoms — from rashes to sore throats — for
patterns that might indicate the early stages of a bio-attack. There was a brief
scare: the system noticed a surge in flulike symptoms at military clinics.
Thankfully, tests confirmed it was just that — the flu.
</text>
When this transformation is applied to the XML document above, the same result as with the first solution is produced.
I would imagine that tokenize()
or <xsl:analyze-string>
could be used to do this efficiently, using a regexp that allows up to (say) 70 characters, and ends with a breaking character (e.g. space).
For explicit code, see the XPath and XSLT answers at xquery word wrap.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With