Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parsing XML file in Node.js

I am using an Arch Linux system with KDE plasma. I have approximately 50mb XML, and I need to parse it. The file has custom tags.

Example XML:

<JMdict>
   <entry>
      <ent_seq>1000000</ent_seq>
      <r_ele>
         <reb>ヽ</reb>
      </r_ele>
      <sense>
         <pos>&unc;</pos>
         <gloss g_type="expl">repetition mark in katakana</gloss>
      </sense>
   </entry>
</JMdict>

I have tried many solutions that were suggested on Stack Overflow, and they did not work at all, and some of them could not installed to my system like xml-stream, xml2json. I decided to use xml2js (most of them suggest to use xml2js), and got the same result. How can I correctly use it ? I am using this code but it always returns undefined:

const fs = require('fs-extra');
const xml2js = require('xml2js');
const parser = new xml2js.Parser();

const path = "test.xml";

fs.readFile(path, {encoding: 'utf-8'}, function(error, data) {
     parser.parseString(data, function(err, res) {
         console.log(res);
     });
});

Result: Undefined

Is there any way to handle an XML file by hand (without a package)?

like image 743
Kaan Taha Köken Avatar asked Jan 01 '19 15:01

Kaan Taha Köken


1 Answers

Answer is below Working Example Link

var fs = require('fs'),
slash = require('slash'),
xml2js = require('xml2js');

var parser = new xml2js.Parser();

let filename = slash(__dirname+'/foo.xml');

// console.log(filename);

fs.readFile(filename,  "utf8", function(err, data) {

    if(err) {
        console.log('Err1111');
        console.log(err);
    } else {
        //console.log(data);
        // data.toString('ascii', 0, data.length)

        parser.parseString(data.replace(/&(?!(?:apos|quot|[gl]t|amp);|#)/g, '&amp;'), function (err, result) {
            if(err) {
                console.log('Err');
                console.log(err);
            } else {
                console.log(JSON.stringify(result));
                console.log('Done');
            }            
        });
    }
});

Exact you have to do it below :

data.replace(/&(?!(?:apos|quot|[gl]t|amp);|#)/g, '&')

Problem is below tag only &unc;

<pos>&unc;</pos>

Referenced And Thanks to @tim

like image 55
RGKrish183 Avatar answered Sep 17 '22 03:09

RGKrish183