Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Node.js Example to convert Xml to JSON for large Xml file

I'm relatively new to Node.js. I'm trying to convert 83 XML files that are each around 400MB in size into JSON.

Each file contains data like this (except each element has a large number of additional statements):

<case-file>
  <serial-number>75563140</serial-number>
  <registration-number>0000000</registration-number>
  <transaction-date>20130101</transaction-date>
  <case-file-header>
     <filing-date>19981002</filing-date>
     <status-code>686</status-code>
     <status-date>20130101</status-date>
  </case-file-header>
  <case-file-statements>
     <case-file-statement>
        <type-code>D10000</type-code>
        <text>"MUSIC"</text>
     </case-file-statement>
     <case-file-statement>
        <type-code>GS0351</type-code>
        <text>compact discs</text>
     </case-file-statement>
  </case-file-statements>
  <case-file-event-statements>
     <case-file-event-statement>
        <code>PUBO</code>
        <type>A</type>
        <description-text>PUBLISHED FOR OPPOSITION</description-text>
        <date>20130101</date>
        <number>28</number>
     </case-file-event-statement>
     <case-file-event-statement>
        <code>NPUB</code>
        <type>O</type>
        <description-text>NOTICE OF PUBLICATION</description-text>
        <date>20121212</date>
        <number>27</number>
     </case-file-event-statement>
   </case-file-event-statements>

I have tried a lot of different Node modules, including sax, node-xml, node-expat and xml2json. Obviously, I need to stream the data from the file and pipe it through an XML parser and then convert it to JSON.

I have also tried reading a number of blogs, etc. attempting to explain, albeit superficially, how to parse Xml.

In the Node universe, I tried sax first but I can't figure out how to extract the data in a format that I can convert it to JSON. xml2json won't work on streams. node-xml looks encouraging but I can't figure out how it parses chunks in any manner that makes sense. node-expat points to libexpat documentation, which appears to requires a Ph.D. Node elementree does the same, pointing to the Python implementation but doesn't explain what has been implemented or how to use it.

Can someone point me to example that I could use to get started?

like image 949
rob_hicks Avatar asked Feb 13 '13 03:02

rob_hicks


People also ask

Can we convert XML to JSON in JavaScript?

If you'd like the JavaScript in string JSON format, you can code: // Assuming xmlDoc is the XML DOM Document var jsonText = JSON. stringify(xmlToJson(xmlDoc)); This function has been extremely useful in allowing me to quickly disregard XML and use JSON instead.

Is XML more size efficient than JSON?

Generally speaking, JSON is much faster and smaller than the equivalent XML.


2 Answers

Although this question is quite old, I am sharing my problem & solution which might be helpful to all who are trying to convert XML to JSON.

The actual problem here is not the conversion but processing huge XML files without having to hold them in memory at once.

Working with almost all widely used packages, I came across following problem -

  • A lot of packages support XML to JSON conversion covering all scenarios but they don't work well with large files.

  • Very few packages (like xml-flow, xml-stream) support large XML file conversion but the conversion process misses out few corner case scenarios where the conversion either fails or gives unpredictable JSON structure (explained in this SO question).

The ideal solution would be to combine the advantages from both the approaches which is exactly what I did and came up with xtreamer node package.

In simple words, xtreamer accepts repeating node just like xml-flow / xml-stream but emits repeating xml nodes instead of converted JSON. This provides following advantages -

  • We can pipe xtreamer with any readable stream as it extends transform stream.
  • The emitted XML nodes can be transferred to any XML to JSON parser to get desired JSON.
  • We can go one step further and hook up the JSON parser with xtreamer & it will invoke the JSON parser and emit JSON accordingly.
  • xtreamer has stream as its only dependency & being a transform stream extension, it can be piped with other streams flexibly.

What if XML structure is not fixed?

I managed to come up with another sax based node package xtagger which reads the XML file and provides the structure of the file in following format -

structure: { [name: string]: { [hierarchy: number]: number } };

This package allows to figure out the repeating node name which can then be passed to xtreamer for parsing.

I hope this helps. :)

like image 119
planet_hunter Avatar answered Nov 15 '22 07:11

planet_hunter


I doubt this is still relevant after 2-3 years but in case anyone else stumbles on this, I would say xml-stream on NPM looked rather straightforward to me.

If you're a windows user who wants to avoid GYP however I tried adding a very simple solution using sax to extract children form an XML file one by one, it's called no-gyp-xml-stream and it may not have a lot of features, but it certainly is simple to use: https://www.npmjs.com/package/no-gyp-xml-stream

like image 31
Søren Ullidtz Avatar answered Nov 15 '22 05:11

Søren Ullidtz