Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parse text file with XSLT

I have a plain text file structured like this:

!ITEM_NAME
Item value
!ANOTHER_ITEM
Its value
...

Is it possible to get with XSLT a file similar to:

<?xml version="1.0" encoding="UTF-8" ?>
<document>
  <ITEM_NAME>Item value</ITEM_NAME>
  <ANOTHER_ITEM>Its value</ANOTHER_ITEM>
  ...
</document>

EDIT

I am sorry I haven't clearly stated before. I am trying to accomplish this transformation with the Visual Studio 2005 XSLT engine. I have tried both of the provided solutions, and I am sure that are correct. But Visual Studio 2005 doesn't know the unparsed-text function.

like image 237
sblandin Avatar asked Apr 12 '13 14:04

sblandin


2 Answers

This XSLT 2.0 transformation:

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>

 <xsl:variable name="vText" select=
 "replace(unparsed-text('file:///c:/temp/delete/text.txt'),'\r','')"/>

 <xsl:template match="/">
  <document>
      <xsl:analyze-string select="$vText" regex="(!(.+?)\n([^\n]+))+">
       <xsl:matching-substring>
         <xsl:element name="{regex-group(2)}">
                <xsl:sequence select="regex-group(3)"/>
         </xsl:element>
       </xsl:matching-substring>
       <xsl:non-matching-substring><xsl:sequence select="."/></xsl:non-matching-substring>
      </xsl:analyze-string>
  </document>
 </xsl:template>
</xsl:stylesheet>

when appliedon any XML document (not used) and having the provided text residing in the local file C:\temp\delete\Text.txt:

!ITEM_NAME
Item value
!ANOTHER_ITEM
Its value
...

produces the wanted, correct result:

<document>
   <ITEM_NAME>Item value</ITEM_NAME>
   <ANOTHER_ITEM>Its value</ANOTHER_ITEM>
...
</document>

To test more completely, we put this text in the file:

As is text
!ITEM_NAME
Item value
!ANOTHER_ITEM
Its value
As is text2
!TEST_BANG
Here's a value with !bangs!!!
!TEST2_BANG
 !!!Here's a value with !more~ !bangs!!!
As is text3

The transformation again produces the wanted, correct result:

<document>As is text
<ITEM_NAME>Item value</ITEM_NAME>
<ANOTHER_ITEM>Its value</ANOTHER_ITEM>
As is text2
<TEST_BANG>Here's a value with !bangs!!!</TEST_BANG>
<TEST2_BANG> !!!Here's a value with !more~ !bangs!!!</TEST2_BANG>
As is text3
</document>
like image 190
Dimitre Novatchev Avatar answered Sep 30 '22 00:09

Dimitre Novatchev


If you can use XSLT 2.0 you could use unparsed-text()...

Text File (Do not use the text file as direct input to the XSLT.)

!ITEM_NAME
Item value
!ANOTHER_ITEM
Its value
!TEST_BANG
Here's a value with !bangs!!!

XSLT 2.0 (Apply this XSLT to itself (use the stylesheet as the XML input). You'll also have to change the path to your text file. You might have to change the encoding too.)

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" exclude-result-prefixes="xs">
    <xsl:output indent="yes"/>
    <xsl:strip-space elements="*"/>

    <xsl:param name="text-encoding" as="xs:string" select="'iso-8859-1'"/>
    <xsl:param name="text-uri" as="xs:string" select="'file:///C:/Users/dhaley/Desktop/test.txt'"/>

    <xsl:template name="text2xml">
        <xsl:variable name="text" select="unparsed-text($text-uri, $text-encoding)"/>
        <xsl:analyze-string select="$text" regex="!(.*)\n(.*)">
            <xsl:matching-substring>
                <xsl:element name="{normalize-space(regex-group(1))}">
                    <xsl:value-of select="normalize-space(regex-group(2))"/>
                </xsl:element>
            </xsl:matching-substring>
        </xsl:analyze-string>
    </xsl:template>

    <xsl:template match="/">
        <document>
            <xsl:choose>
                <xsl:when test="unparsed-text-available($text-uri, $text-encoding)">
                    <xsl:call-template name="text2xml"/>                                
                </xsl:when>
                <xsl:otherwise>
                    <xsl:variable name="error">
                        <xsl:text>Error reading "</xsl:text>
                        <xsl:value-of select="$text-uri"/>
                        <xsl:text>" (encoding "</xsl:text>
                        <xsl:value-of select="$text-encoding"/>
                        <xsl:text>").</xsl:text>
                    </xsl:variable>
                    <xsl:message><xsl:value-of select="$error"/></xsl:message>
                    <xsl:value-of select="$error"/>
                </xsl:otherwise>
            </xsl:choose>
        </document>
    </xsl:template>
</xsl:stylesheet>

XML Output

<document>
   <ITEM_NAME>Item value</ITEM_NAME>
   <ANOTHER_ITEM>Its value</ANOTHER_ITEM>
   <TEST_BANG>Here's a value with !bangs!!!</TEST_BANG>
</document>
like image 35
Daniel Haley Avatar answered Sep 29 '22 23:09

Daniel Haley