Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Transform xml file to plain text format

Tags:

xslt

I have an xml file as the following that I want to translate into a plain text file.

<?xml version="1.0" encoding="UTF-8"?>
<abc xmlns="ddn:cns-org:v5">
    <section>
        <title>PREORDER</title>
        <code code="0" cs="12.222" csn="CSN" />
        <text>
            <paragraph>
                <content>preorder.</content>
                <content></content>
            </paragraph>
            <paragraph>
                <content>preorder description.</content>
                <content>preorder detail goes here.</content>
            </paragraph>
        </text>
    </section>
    <section>
        <title>POSTORDER</title>
        <code code="0" cs="12.222" csn="CSN" />
        <text>
            <paragraph>
                <content>postorder.</content>
            </paragraph>
        </text>
    </section>
</abc>

I want to the output like:

PREORDER
(ignore the line of <code .../>
pre-order.
(Note: if there is no content - <content></content>, don't record anything)
preorder description.
preorder detail goes here.

POSTORDER
post-order.

I was trying to use xsl transform, but I don't really understand the path/node. I got something like the following, obviously it didn't work. Please help.

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    abc xmlns="ddn:cns-org:v5">
    <xsl:output method="text" omit-xml-declaration="yes"/>

    <!-- ***********************************************************
        Call template process section 
         ************************************************************-->
    <xsl:template match="/">
      <!--  <xsl:for-each select="abc"> -->
       <xsl:for-each select="section">
            <xsl:call-template name="process_section"> </xsl:call-template>
       </xsl:for-each>
      <!--  </xsl:for-each> -->
    </xsl:template>


     <xsl:template name="process_section">
        <!-- Choose : Section or Subsection  -->
        <xsl:choose>

            <!-- ***********************************************************************************
                (if section: 
                *check if not empty title  
                * Bring to upper case
                * check if already one colon, don't put any more
                * put newline 
            ******************************************************************************************-->
            <xsl:when test=" local-name(.)='abc' ">
                <xsl:if test="not(string-length(normalize-space(./section/title))=0)">
                    <xsl:variable name="title" select="string(./section/title)"/>
                    <xsl:value-of
                        select="translate($title, 'abcdefghijklmnopqrstuvwxyz', 'ABCDEFGHIJKLMNOPQRSTUVWXYZ')"/>
                    <xsl:if test="not( substring($title, string-length($title),1)=':' )">
                        <xsl:text>:</xsl:text>
                    </xsl:if>
                    <xsl:text>&#xa;</xsl:text>
                </xsl:if>
            </xsl:when>

            ...

Thanks, Dave


Now I need to transform an xml file into a well-formated text file. Given the source xml file as:

  <?xml version="1.0" encoding="UTF-8"?>
    <abc xmlns="ddn:cns-org:v5">
        <section>
            <title>PREOPERATIVE DIAGNOSIS  </title>
            <code code="0" cs="12.222" csn="CSN" />
            <text>
                <paragraph>
                    <content>__ c</content>
                    <content>ataract.</content>
                </paragraph>
            </text>
        </section>
        <section>
            <title>POSTOPERATIVE DIAGNOSIS</title>
            <code code="0" cs="12.222" csn="CSN" />
            <text>
                <paragraph>
                    <content>__ c</content>
                    <content>ataract.</content>
                </paragraph>
            </text>
        </section>
        <section>
            <title>OPERATION</title>
            <code code="0" cs="12.222" csn="CSN" />
            <text>
                <paragraph>
                    <content>Pars plana vitrectomy
                        and epiretinal membrane removal. Phacoemulsification with posterior
                        chamber lens implant __(Control #</content>
                    <content>__</content>
                    <content>) </content>
                </paragraph>
            </text>
        </section>
        <section>
            <title>SURGEON</title>
            <code code="0" cs="12.222" csn="CSN" />
            <text>
                <paragraph>
                    <content>Dr. J. R. xxx
                    </content>
                </paragraph>
            </text>
        </section>
        <section>
            <title>DOCTORS IN ATTENDANCE</title>
            <code code="0" cs="12.222" csn="CSN" />
            <text>
                <paragraph />
            </text>
        </section>
        <section>
            <title>ANESTHETIST</title>
            <code code="0" cs="12.222" csn="CSN" />
            <text>
                <paragraph />
            </text>
        </section>
        <section>
            <title>ANESTHESIA</title>
            <code code="0" cs="12.222" csn="CSN" />
            <text>
                <paragraph />
            </text>
        </section>
        <section>
            <title>CLINICAL NOTE</title>
            <code code="0" cs="12.222" csn="CSN" />
            <text>
                <paragraph />
            </text>
        </section>
        <section>
            <title>OPERATIVE PROCEDURE</title>
            <code code="0" cs="12.222" csn="CSN" />
            <text>
                <paragraph>
                    <content>Topical sterilization,
                        anesthesia and dilatation were carried out in the Day Surgery Unit
                        before the patient was brought to the Operating room. </content>
                </paragraph>
                <paragraph />
                <paragraph>
                    <content>The patient was
                        prepared with Proviodine. The patient was then draped. Special
                        attention was directed to the eyelids/lashes, prepping and draping.
                        A speculum was inserted into the fornices of the</content>
                    <content> __</content>
                    <content> eye. Subconjunctival
                        Xylocaine was given. A small conjunctival peritomy was made in the
                        area of the subconjunctival injection. Three cc of 2% Xylocaine and
                        0.75% Marcaine were injected into the sub-Tenon's space through the
                        peritomy.  </content>
                </paragraph>
                <paragraph />
                <paragraph>
                    <content>A paracentesis was
                        performed superiorly and inferiorly. Intraocular nonpreserved
                        Xylocaine was used. A viscoelastic was injected into the anterior
                        chamber. The eye was entered with a 3 mm keratome. The anterior
                        capsulectomy was done under a viscoelastic. Hydrodissection and
                        delineation of the nucleus was carried out. The
                        phaco-emulsification was initiated. The phacoemulsification time
                        was </content>
                    <content>__</content>
                    <content> seconds. At the
                        completion of the phacoemulsification, the cortical material was
                        irrigated and aspirated from the eye. The posterior capsule was
                        polished. The intraocular lens was inserted under a viscoelastic.
                        The intraocular lens power was __ diopters.   </content>
                </paragraph>
                <paragraph />
                <paragraph>
                    <content>The lens was the dialed
                        into position with the lens pick. Irrigation and aspiration of the
                        viscoelastic was then carried out. The wound was tested to ensure
                        that there was no wound leak.  </content>
                </paragraph>
                <paragraph />
                <paragraph>
                    <content>Additional periglobal
                        anesthesia of 2% Xylocaine and 0.75% Marcaine was injected into the
                        sub-Tenon's space through the peritomy. </content>
                </paragraph>
                <paragraph />
                <paragraph>
                    <content>The 25-gauge cannulas
                        were then placed inferotemporal, supratemporal and supranasally. In
                        the infratemporal cannula an infusion placed.  </content>
                </paragraph>
                <paragraph />
                <paragraph>
                    <content>A vitrectomy was then
                        carried out.  </content>
                </paragraph>
                <paragraph />
                <paragraph>
                    <content>__ (Dr. xxx will
                        refer to this area as an "open paragraph</content>
                    <content>"</content>
                    <content>- DELETE THIS
                        INFORMATION</content>
                    <content>)</content>
                </paragraph>
                <paragraph />
                <paragraph>
                    <content>At the completion of
                        the vitrectomy, the peripheral retina was reinspected to ensure
                        there was no untreated peripheral retinal tears or detachment. The
                        cannulas were removed. The scleral wounds inspected and if a wound
                        leak was identified, then this was closed with 9-0 Vicryl suture.
                        Otherwise interrupted 9-0 Vicryl sutures were placed.
                        Subconjunctival Decadron and cefuroxime were given.  </content>
                </paragraph>
                <paragraph />
                <paragraph>
                    <content>A patch and shield
                        applied.    </content>
                </paragraph>
                <paragraph />
                <paragraph>
                    <content>The patient left the
                        Operating Room in satisfactory condition.  </content>
                </paragraph>
            </text>
        </section>
        <section>
            <title>ADDENDUM</title>
            <code code="0" cs="12.222" csn="CSN" />
            <text>
                <paragraph />
            </text>
        </section>
        <section>
            <title>POSTOPERATIVE TOPICAL EYE DROPS</title>
            <code code="0" cs="12.222" csn="CSN" />
            <text>
                <paragraph>
                    <content>1. Maxidex.</content>
                </paragraph>
                <paragraph>
                    <content>2. Nevanac.</content>
                </paragraph>
                <paragraph>
                    <content>3. Vigamox.</content>
                </paragraph>
            </text>
        </section>
        <section>
            <title>POSTOPERATIVE FOLLOWUP</title>
            <code code="0" cs="12.222" csn="CSN" />
            <text>
                <paragraph>
                    <content>__</content>
                </paragraph>
            </text>
        </section>
    </abc>

I'd like to have the output like the following.

PREOPERATIVE DIAGNOSIS:

__ cataract.

POSTOPERATIVE DIAGNOSIS:

__ cataract.

OPERATION:

Pars plana vitrectomy and epiretinal membrane removal.  Phacoemulsification with posterior chamber lens implant __(Control #__) 

SURGEON:

Dr. J. R. xxx 

DOCTORS IN ATTENDANCE:

ANESTHETIST:

ANESTHESIA:

CLINICAL NOTE:

OPERATIVE PROCEDURE:

Topical sterilization, anesthesia and dilatation were carried out in the Day Surgery Unit before the patient was brought to the Operating room. 

The patient was prepared with Proviodine.  The patient was then draped. Special attention was directed to the eyelids/lashes, prepping and draping.   A speculum was inserted into the fornices of the __ eye.  Subconjunctival Xylocaine was given.  A small conjunctival peritomy was made in the area of the subconjunctival injection.  Three cc of 2% Xylocaine and 0.75% Marcaine were injected into the sub-Tenon's space through the peritomy.  

A paracentesis was performed superiorly and inferiorly. Intraocular nonpreserved Xylocaine was used. A viscoelastic was injected into the anterior chamber.  The eye was entered with a 3 mm keratome.  The anterior capsulectomy was done under a viscoelastic. Hydrodissection and delineation of the nucleus was carried out.  The phaco-emulsification was initiated.  The phacoemulsification time was __ seconds.  At the completion of the phacoemulsification, the cortical material was irrigated and aspirated from the eye.  The posterior capsule was polished.  The intraocular lens was inserted under a viscoelastic.  The intraocular lens power was __  diopters.   

The lens was the dialed into position with the lens pick.  Irrigation and aspiration of the viscoelastic was then carried out.  The wound was tested to ensure that there was no wound leak.  

Additional periglobal anesthesia of 2% Xylocaine and 0.75% Marcaine was injected into the sub-Tenon's space through the peritomy. 

The 25-gauge cannulas were then placed inferotemporal, supratemporal and supranasally.  In the infratemporal cannula an infusion placed.  

A vitrectomy was then carried out.  

__ (Dr. xxx will refer to this area as an "open paragraph"- DELETE THIS INFORMATION)

At the completion of the vitrectomy, the peripheral retina was reinspected to ensure there was no untreated peripheral retinal tears or detachment.  The cannulas were removed.  The scleral wounds inspected and if a wound leak was identified, then this was closed with 9-0 Vicryl suture.   Otherwise interrupted 9-0 Vicryl sutures were placed.  Subconjunctival Decadron and cefuroxime were given.  

A patch and shield applied.    

The patient left the Operating Room in satisfactory condition.  

ADDENDUM:

POSTOPERATIVE TOPICAL EYE DROPS:

1.  Maxidex.
2.  Nevanac.
3.  Vigamox.

POSTOPERATIVE FOLLOWUP:
__

The transform rules are:

Sections should be separated by a new blank line.

Section titles should be all upper case letters followed by a colon(:) and should be trimmed.

Each title should be separated by new blank lines (before and after it).

Each paragraph should be separated by new blank lines (before and after it).

All contents of inside one paragraph should be merged as single line.

There are some similar code I have found in the project that I'd like to reuse. However, I am struggled to figure out the xPath func/variables. I really want to combine this with the code Mads provided. But I failed.

Here is the code I intended to run.

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:d="ddn:cns-org:v5">
    <xsl:output method="text" omit-xml-declaration="yes"/>

    <!-- ***********************************************************
        Call template process section 
         ************************************************************-->
    <xsl:template match="/">
      <xsl:for-each select="d:abc/d:section">  <!-- Is it OK?-->    
            <xsl:call-template name="process_section"> </xsl:call-template>      
      </xsl:for-each>
    </xsl:template>


     <xsl:template name="process_section">

                <xsl:if test="not(string-length(normalize-space(./section/title))=0)">
                    <xsl:variable name="title" select="string(./section/title)"/>
                    <xsl:value-of
                        select="translate($title, 'abcdefghijklmnopqrstuvwxyz', 'ABCDEFGHIJKLMNOPQRSTUVWXYZ')"/>
                    <xsl:if test="not( substring($title, string-length($title),1)=':' )">
                        <xsl:text>:</xsl:text>
                    </xsl:if>
                    <xsl:text>&#xa;</xsl:text>
                </xsl:if>

                <xsl:if test="not(string-length(normalize-space(./section/title))=0)">
                    <xsl:variable name="title" select="string(./section/title)"/>
                    <xsl:call-template name="TitleCase">
                        <!-- <xsl:with-param name="text" select="./section/title"/> -->
                        <xsl:with-param name="title" select="$title"/>
                        <xsl:with-param name="start" select="true()"/>
                    </xsl:call-template>
                    <xsl:if test="not( substring($title, string-length($title),1)=':' )">
                        <xsl:text>:</xsl:text>
                    </xsl:if>
                    <xsl:text>&#xa;</xsl:text>
                </xsl:if>


        <!-- *****************************************************************************************
          Start processing Text .Paragraph/list under section/text. 
        ****************************************************************************************************-->
        <xsl:for-each select="./section/text/node()">
            <xsl:choose>
                <!-- ************************************************************************************
                    if a node under text is a paragraph
                    *Replace single ':' (no caret / no digit in front) with caret followed by colon '^:'
                    *Also, do not put newline if there is nothing withing paragraph
                    *(level section/text/paragraph or node())
                ******************************************************************************************-->
                <xsl:when test="local-name()='paragraph'">

                    <xsl:for-each
                        select="./content[not(@styleCode='hidden' or @styleCode='bookmark')]">

                        <xsl:choose>


                            <xsl:when test="contains(., ':')">
                                <xsl:call-template name="replaceColon">
                                    <xsl:with-param name="text" select="."/>
                                </xsl:call-template>
                            </xsl:when>
                            <xsl:otherwise>
                                <xsl:value-of select="."/>
                            </xsl:otherwise>

                        </xsl:choose>
                    </xsl:for-each>

                    <xsl:if
                        test="(count(./content[not(@styleCode='hidden' or @styleCode='bookmark')]) &gt; 0) or (count(./content)=0)">
                        <xsl:text>&#xa;</xsl:text>
                    </xsl:if>
                </xsl:when>

                <!-- otherwise it has to be list -->
                <xsl:when test="local-name()='list'">
                    <xsl:call-template name="listProcessing"> </xsl:call-template>
                </xsl:when>
            </xsl:choose>

        </xsl:for-each>
        <!-- ********************************************************************
            Done w/ Text (paragraph/list) done. Go back to (level component) 
        **************************************************************************-->


        <xsl:if
            test="string-length(normalize-space(.//content[not(@styleCode='hidden' or @styleCode='bookmark')]/text())) &gt; 0">
            <xsl:text>&#xa;</xsl:text>
        </xsl:if>

        <!-- Subsection. (Level component)  -->
        <xsl:for-each select="./section/component">
            <xsl:if test="not(./section/code/@code='9001')">
                <xsl:call-template name="process_section"/>
            </xsl:if>
        </xsl:for-each>

        <!-- ******************************************
            end of processing component
            *******************************************-->
    </xsl:template>

    <xsl:template name="replaceColon">
        <xsl:param name="text"/>
        <xsl:variable name="before" select="substring-before($text,':')"/>
        <xsl:variable name="after" select="substring-after($text,':')"/>
        <xsl:choose>
            <xsl:when test="contains($before, ':')">
                <xsl:call-template name="replaceColon">
                    <xsl:with-param name="text" select="$before"/>
                </xsl:call-template>
            </xsl:when>
            <xsl:otherwise>
                <xsl:choose>
                    <xsl:when
                        test="string (number(substring($before,string-length($before),1))) != 'NaN'
                        or string (substring($before,string-length($before),1)) = '^'">
                        <xsl:value-of select="$before"/>
                        <xsl:text>:</xsl:text>
                    </xsl:when>
                    <xsl:otherwise>
                        <xsl:value-of select="$before"/>
                        <xsl:text>^:</xsl:text>
                    </xsl:otherwise>
                </xsl:choose>
            </xsl:otherwise>
        </xsl:choose>

        <xsl:choose>
            <xsl:when test="contains($after, ':')">
                <xsl:call-template name="replaceColon">
                    <xsl:with-param name="text" select="$after"/>
                </xsl:call-template>
            </xsl:when>
            <xsl:otherwise>
                <xsl:value-of select="$after"/>
            </xsl:otherwise>
        </xsl:choose>

    </xsl:template>

    <!-- *********************************************************
        Begining listProcessing. (Level ./section/text/list)
        *********************************************************** -->
    <xsl:template name="listProcessing">

        <!-- ***iterate each item under list (Level ./section/text/list/item)********* -->
        <xsl:for-each select="./item">
            <xsl:value-of select="position()"/>
            <xsl:value-of select="'. '"/>

            <!-- Checking node under item. (Level ./section/text/list/item/node()) -->
            <xsl:for-each select="./node()">
                <xsl:choose>
                    <!-- ******************************
                        Make sure the paragraph has content, otherwise don't print it or the newline. (CP-5076)
                        ******************************** -->
                    <xsl:when test="local-name(.)='paragraph' and string-length(.)>0">
                        <!-- print paragraph content -->
                        <xsl:for-each
                            select="./content[not(@styleCode='hidden' or @styleCode='bookmark')]">
                            <xsl:choose>
                                <xsl:when test="contains(., ':')">
                                    <xsl:call-template name="replaceColon">
                                        <xsl:with-param name="text" select="."/>
                                    </xsl:call-template>
                                </xsl:when>
                                <xsl:otherwise>
                                    <xsl:value-of select="."/>
                                </xsl:otherwise>
                            </xsl:choose>
                        </xsl:for-each>
                        <xsl:text>&#xa;</xsl:text>
                    </xsl:when>
                    <!-- ******* otherwise it has to be list with a list *********** -->
                    <xsl:when test="local-name(.)='list'">
                        <!-- ******call listProcessing recursively****** -->
                        <xsl:call-template name="listProcessing"/>
                    </xsl:when>
                </xsl:choose>
            </xsl:for-each>
            <!-- node -->
        </xsl:for-each>
        <!-- item -->

        <!-- *********** End listProcessing******** -->
    </xsl:template>


    <!-- *****************************************************
    ******** Title Case**************** *********************
    ****************************************************-->

    <xsl:template name="TitleCase">
        <xsl:param name="title"/>
        <xsl:param name="start"/>

        <xsl:choose>
            <xsl:when test="$start=true()">
                <xsl:choose>
                    <xsl:when test="contains(substring($title,1,1),'[')">
                        <!-- found start bracket -->
                        <xsl:variable name="romanNumneral"
                            select="substring-before(substring-after($title,'['),']')"/>
                        <xsl:value-of select="$romanNumneral"/>
                        <xsl:variable name="title2"
                            select="substring($title,string-length($romanNumneral)+3)"/>
                        <xsl:if test="string-length($title2) > 0">
                            <!-- this is end condition for recursive call -->
                            <xsl:call-template name="TitleCase">
                                <xsl:with-param name="title" select="$title2"/>
                                <xsl:with-param name="start" select="true()"/>
                                <!-- "[IV] axis" in this case it should become "IV Axis", so call with true -->
                            </xsl:call-template>
                        </xsl:if>
                    </xsl:when>
                    <xsl:otherwise>
                        <!-- title case me -->
                        <xsl:value-of
                            select="translate(substring($title,1,1),'abcdefghijklmnopqrstuvwxyz:','ABCDEFGHIJKLMNOPQRSTUVWXYZ:')"/>
                        <xsl:variable name="title2" select="substring($title,2)"/>
                        <xsl:if test="string-length($title2) > 0">
                            <xsl:call-template name="TitleCase">
                                <xsl:with-param name="title" select="$title2"/>
                                <xsl:with-param name="start" select="false()"/>
                            </xsl:call-template>
                        </xsl:if>
                    </xsl:otherwise>
                </xsl:choose>
            </xsl:when>
            <!-- start test ends -->
            <xsl:when test="contains(substring($title,1,1),'[')">
                <!-- roman test -->
                <xsl:variable name="romanNumneral"
                    select="substring-before(substring-after($title,'['),']')"/>
                <xsl:value-of select="$romanNumneral"/>
                <xsl:variable name="title2"
                    select="substring($title,string-length($romanNumneral)+3)"/>
                <xsl:if test="string-length($title2) > 0">
                    <xsl:call-template name="TitleCase">
                        <xsl:with-param name="title" select="$title2"/>
                        <xsl:with-param name="start" select="false()"/>
                    </xsl:call-template>
                </xsl:if>
            </xsl:when>
            <!-- end Roman test -->
            <xsl:otherwise>
                <!-- small-test case start -->
                <xsl:value-of
                    select="translate(string(substring($title,1,1)), 'ABCDEFGHIJKLMNOPQRSTUVWXYZ:', 'abcdefghijklmnopqrstuvwxyz:')"/>
                <xsl:variable name="title2" select="substring($title,2)"/>
                <xsl:if test="string-length($title2) > 0">
                    <xsl:choose>
                        <xsl:when test="not(contains($title2, ']'))">

                            <xsl:value-of
                                select="translate($title2, 'ABCDEFGHIJKLMNOPQRSTUVWXYZ:', 'abcdefghijklmnopqrstuvwxyz:')"/>


                            <!--   <xsl:value-of select="$title2"/> -->
                        </xsl:when>
                        <xsl:otherwise>
                            <xsl:call-template name="TitleCase">
                                <xsl:with-param name="title" select="$title2"/>
                                <xsl:with-param name="start" select="false()"/>
                            </xsl:call-template>
                        </xsl:otherwise>
                    </xsl:choose>
                </xsl:if>
            </xsl:otherwise>
            <!-- small-case test ends here -->
        </xsl:choose>
    </xsl:template>


    </xsl:stylesheet>

Please Help.

Thanks, Dave

like image 947
dddliao Avatar asked Mar 24 '26 16:03

dddliao


1 Answers

The elements in your XML document are bound to a namespace. It is easy to miss, because there is no namespace-prefix, but the declaration on the document element: xmlns="ddn:cns-org:v5" binds that element (and it's descendants) to that namespace.

So, in order to address with XPath in your XSLT, you will need to declare that namespaces and use a namespace-prefix.

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:d="ddn:cns-org:v5">
    <xsl:output method="text" omit-xml-declaration="yes"/>

    <xsl:template match="text()[normalize-space()]">
        <xsl:value-of select="."/>
        <xsl:text>&#xA;</xsl:text>
    </xsl:template>

    <xsl:template match="text()[not(normalize-space())]"/>

    <xsl:template match="d:section">
        <xsl:apply-templates/>
        <xsl:text>&#xA;</xsl:text>
    </xsl:template>

    <xsl:template match="d:title">
        <xsl:value-of select="translate(., 'abcdefghijklmnopqrstuvwxyz', 'ABCDEFGHIJKLMNOPQRSTUVWXYZ')"/>
        <xsl:if test="not( substring(., string-length(.),1)=':' )">
            <xsl:text>:</xsl:text>
        </xsl:if>
        <xsl:text>&#xA;</xsl:text>
    </xsl:template>

</xsl:stylesheet>
like image 132
Mads Hansen Avatar answered Mar 28 '26 03:03

Mads Hansen