I need to read a large JSON file (around 630MB) in Nodejs and insert each object to MongoDB. I've read the answer here:Parse large JSON file in Nodejs. However, answers there are handling the JSON file line by line, instead of handling it object by object. Thus, I still don't know how to get an object from this file and operate it. I have about 100,000 this kind of objects in my JSON file. Data Format: <pre class="prettyprint"><code>[ { "id": "0000000", "name": "Donna Blak", "livingSuburb": "Tingalpa", "age": 53, "nearestHospital": "Royal Children's Hospital", "treatments": { "19890803": { "medicine": "Stomach flu B", "disease": "Stomach flu" }, "19740112": { "medicine": "Progeria C", "disease": "Progeria" }, "19830206": { "medicine": "Poliomyelitis B", "disease": "Poliomyelitis" } }, "class": "patient" }, ... ] </code></pre> Cheers, Alex

There is a nice module named 'stream-json' that does exactly what you want. <blockquote> It can parse JSON files far exceeding available memory. </blockquote> and <blockquote> StreamArray handles a frequent use case: a huge array of relatively small objects similar to Django-produced database dumps. It streams array components individually taking care of assembling them automatically. </blockquote> Here is a very basic example: <div class="snippet" data-lang="js" data-hide="false" data-console="true" data-babel="false"> <div class="snippet-code"> <pre class="prettyprint snippet-code-js lang-js prettyprint-override"><code>const StreamArray = require('stream-json/streamers/StreamArray'); const path = require('path'); const fs = require('fs'); const jsonStream = StreamArray.withParser(); //You'll get json objects here //Key is an array-index here jsonStream.on('data', ({key, value}) => { console.log(key, value); }); jsonStream.on('end', () => { console.log('All done'); }); const filename = path.join(__dirname, 'sample.json'); fs.createReadStream(filename).pipe(jsonStream.input);</code></pre> </div> </div> If you'd like to do something more complex e.g. process one object after another sequentially (keeping the order) and apply some async operations for each of them then you could do the custom Writeable stream like this: <div class="snippet" data-lang="js" data-hide="false" data-console="true" data-babel="false"> <div class="snippet-code"> <pre class="prettyprint snippet-code-js lang-js prettyprint-override"><code>const StreamArray = require('stream-json/streamers/StreamArray'); const {Writable} = require('stream'); const path = require('path'); const fs = require('fs'); const fileStream = fs.createReadStream(path.join(__dirname, 'sample.json')); const jsonStream = StreamArray.withParser(); const processingStream = new Writable({ write({key, value}, encoding, callback) { //Save to mongo or do any other async actions setTimeout(() => { console.log(value); //Next record will be read only current one is fully processed callback(); }, 1000); }, //Don't skip this, as we need to operate with objects, not buffers objectMode: true }); //Pipe the streams as follows fileStream.pipe(jsonStream.input); jsonStream.pipe(processingStream); //So we're waiting for the 'finish' event when everything is done. processingStream.on('finish', () => console.log('All done'));</code></pre> </div> </div> Please note: The examples above are tested for 'stream-json@1.1.3'. For some previous versions (presumably proior to 1.0.0) you might have to: <code>const StreamArray = require('stream-json/utils/StreamArray');</code> and then <code>const jsonStream = StreamArray.make();</code>

Parse large JSON file in Nodejs and handle each object independently

Tags:

I need to read a large JSON file (around 630MB) in Nodejs and insert each object to MongoDB.

I've read the answer here:Parse large JSON file in Nodejs.

However, answers there are handling the JSON file line by line, instead of handling it object by object. Thus, I still don't know how to get an object from this file and operate it.

I have about 100,000 this kind of objects in my JSON file.

Data Format:

[   {     "id": "0000000",     "name": "Donna Blak",     "livingSuburb": "Tingalpa",     "age": 53,     "nearestHospital": "Royal Children's Hospital",     "treatments": {         "19890803": {             "medicine": "Stomach flu B",             "disease": "Stomach flu"         },         "19740112": {             "medicine": "Progeria C",             "disease": "Progeria"         },         "19830206": {             "medicine": "Poliomyelitis B",             "disease": "Poliomyelitis"         }     },     "class": "patient"   },  ... ]

Cheers,

Alex

428

asked Mar 20 '17 05:03

Lixing Liang

1 Answers

There is a nice module named 'stream-json' that does exactly what you want.

It can parse JSON files far exceeding available memory.

and

StreamArray handles a frequent use case: a huge array of relatively small objects similar to Django-produced database dumps. It streams array components individually taking care of assembling them automatically.

Here is a very basic example:

const StreamArray = require('stream-json/streamers/StreamArray');  const path = require('path');  const fs = require('fs');    const jsonStream = StreamArray.withParser();    //You'll get json objects here  //Key is an array-index here  jsonStream.on('data', ({key, value}) => {      console.log(key, value);  });    jsonStream.on('end', () => {      console.log('All done');  });    const filename = path.join(__dirname, 'sample.json');  fs.createReadStream(filename).pipe(jsonStream.input);

If you'd like to do something more complex e.g. process one object after another sequentially (keeping the order) and apply some async operations for each of them then you could do the custom Writeable stream like this:

const StreamArray = require('stream-json/streamers/StreamArray');  const {Writable} = require('stream');  const path = require('path');  const fs = require('fs');    const fileStream = fs.createReadStream(path.join(__dirname, 'sample.json'));  const jsonStream = StreamArray.withParser();    const processingStream = new Writable({      write({key, value}, encoding, callback) {          //Save to mongo or do any other async actions            setTimeout(() => {              console.log(value);              //Next record will be read only current one is fully processed              callback();          }, 1000);      },      //Don't skip this, as we need to operate with objects, not buffers      objectMode: true  });    //Pipe the streams as follows  fileStream.pipe(jsonStream.input);  jsonStream.pipe(processingStream);    //So we're waiting for the 'finish' event when everything is done.  processingStream.on('finish', () => console.log('All done'));

Please note: The examples above are tested for '[email protected]'. For some previous versions (presumably proior to 1.0.0) you might have to:

const StreamArray = require('stream-json/utils/StreamArray');

and then

const jsonStream = StreamArray.make();

answered Oct 10 '22 22:10

Antonio Narkevich

Related questions
                            
                                ESLint no-use-before-define
                            
                                Why this.state is undefined in react native?
                            
                                Collecting lists from an object list using Java 8 Stream API
                            
                                ReactJS Module build failed: SyntaxError: Unexpected token - ReactDOM.render
                            
                                Why is this version of strcmp slower?
                            
                                vue js 2: Access props in mounted function
                            
                                Get rid of "warning: command substitution: ignored null byte in input"
                            
                                Downgrade kubectl version to match minikube k8s version
                            
                                What is `<<` and `&` in yaml mean?
                            
                                Configure GPG for Git on Windows
                            
                                How can I read a local file with Papa Parse?
                            
                                How to disable UP in Navigation for some fragment with the new Navigation Architecture Component?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With