Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

XML to CSV conversion using XQuery

I have an XML file and I need to convert it into XQuery. consider a simple set of XML:

books[book]
book[@isbn, title, descrption]

eg:

<books>
    <book isbn="1590593049">
        <title>Extending Flash MX 2004</title>
        <description>
        Using javascript alongwith actionscript 3.0 and mxml.</description>
    </book>
    <book isbn="0132149184">
        <title>Java Software Solutions</title>
        <description>
            Complete book full of case studies on business solutions and design concepts while building mission critical
            business applications.
        </description>
    </book>

How to convert it to CSV format using XQuery? The CSV is used by Microsoft excel,

so it would be delimited by comma (,) character and special characters should be escaped.

like image 694
Ranjan Sarma Avatar asked Dec 30 '12 06:12

Ranjan Sarma


2 Answers

A pure XPath 2.0 expression:

for $b in /*/book
    return
      concat(escape-html-uri(string-join(($b/@isbn,
                                          $b/title,
                                          $b/description
                                          )
                                           /normalize-space(),
                                        ",")
                             ),
             codepoints-to-string(10))

XSLT 2 - based verification:

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>

 <xsl:template match="/">
  <xsl:sequence select=
   "for $b in /*/book
       return
         concat(escape-html-uri(string-join(($b/@isbn,
                                             $b/title,
                                             $b/description
                                             )
                                              /normalize-space(),
                                           ',')
                                ),
                codepoints-to-string(10))"/>
 </xsl:template>
</xsl:stylesheet>

When this transformation is applied on the provided XML document (corrected from its malformedness):

<books>
    <book isbn="1590593049">
        <title>Extending Flash MX 2004</title>
        <description>
        Using javascript alongwith actionscript 3.0 and mxml.</description>
    </book>
    <book isbn="0132149184">
        <title>Java Software Solutions</title>
        <description>
            Complete book full of case studies on business solutions and design concepts while building mission critical
            business applications.
        </description>
    </book>
</books>

the wanted, correct result is produced:

1590593049,Extending Flash MX 2004,Using javascript alongwith actionscript 3.0 and mxml.
 0132149184,Java Software Solutions,Complete book full of case studies on business solutions and design concepts while building mission critical business applications.

Update:

In a comment the OP has requested that any in-text comma be surrounded by a quote and that (after that) any quote be replaced by two quotes, and, finally, if the wholw result contains a quote, it must be surrounded by (single) quotes.

Here is a pure XPath 2.0 expression that produces this:

for $b in /*/book,
    $q in codepoints-to-string(34),
    $NL in codepoints-to-string(10),
    $isbn in normalize-space(replace($b/@isbn, ',', concat($q,',',$q))),
    $t in normalize-space(replace($b/title, ',', concat($q,',',$q))),
    $d in normalize-space(replace($b/description, ',', concat($q,',',$q))),
    $res in
     escape-html-uri(string-join(($isbn,$t,$d), ',')),
    $res2 in replace($res, $q, concat($q,$q))
   return
    if(contains($res2, $q))
       then concat($q, $res2, $q, $NL)
       else concat($res2, $NL)

When this XPath expression is evaluated against this (extended with a new test-case) XML document:

<books>
    <book isbn="1590593049">
        <title>Extending Flash MX 2004</title>
        <description>
        Using javascript alongwith actionscript 3.0 and mxml.</description>
    </book>
    <book isbn="0132149184">
        <title>Java Software Solutions</title>
        <description>
            Complete book full of case studies on business solutions and design concepts while building mission critical
            business applications.
        </description>
    </book>
    <book isbn="XX1234567">
        <title>Quotes and comma</title>
        <description>
            Hello, World from "Ms-Excel"
        </description>
    </book>
</books>

the wanted, correct result is produced:

1590593049,Extending Flash MX 2004,Using javascript alongwith actionscript 3.0 and mxml.
0132149184,Java Software Solutions,Complete book full of case studies on business solutions and design concepts while building mission critical business applications.
"XX1234567,Quotes and comma,Hello"","" World from ""Ms-Excel"""
like image 110
Dimitre Novatchev Avatar answered Oct 21 '22 12:10

Dimitre Novatchev


Assuming your xml is in the variable $books you could create a csv file with each book node on a new line using this:

declare function local:my-replace($input) {
  for $i in $input
  return '"' || replace($i, '"', '""') || '"'
};
for $book in $books//book
return string-join(local:my-replace(($book/@isbn, $book/title, $book/description)), ",") || '&#xa;'

string-join concatenates the different strings, the local function my-replace replaces the values in the sequence according to your specification.

like image 29
dirkk Avatar answered Oct 21 '22 12:10

dirkk