Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extract and Parse Huge incomplete JSON in NodeJS

Imagine a scenario where I have a huge JSON file on a Github gist. That JSON is an array of object and it has 30k+ lines. Now I want to perform ETL(Extract, Transform, Load) those data directly from Github gist to my database. Unfortunately, the last object of that JSON is incomplete and I don't have any control in the external data source. Which means, in a simple demonstration I'm getting the data like this:

[{ "name": { "first": "foo", "last": "bar" } }, { "name": { "first": "ind", "last": "go

What is the best practice or how can I extract such a huge JSON file and parse it correctly in NodeJs?

I've tried to parse using regular JSON.parse() and a npm package named partial-json-parser but it was no help.

Edit

I've found a solution from an external source which solves both incomplete JSON and ETL issues. I'm pasting that snippet here:

import fetch from "node-fetch";
import StreamArray from "stream-json/streamers/StreamArray.js";

const main = async () => {
  const invalidJSON = await fetch(
    "<raw_gist_array_of_objects_api_endpoint>"
  ).then((r) => r.body);

  const finalData = [];
  const pipeline = invalidJSON.pipe(StreamArray.withParser());

  pipeline.on("data", (data) => {
    finalData.push(data.value);
  });

  await new Promise((r) => {
    pipeline.on("end", r);
    pipeline.on("error", r);
  });

  console.log(finalData);
};

main();
like image 413
Mahtab Hossain Avatar asked Sep 16 '25 04:09

Mahtab Hossain


1 Answers

I think you need to fix the JSON structure first. Just try this approach:

import untruncateJson from "untruncate-json";

const str = `[{ "name": { "first": "foo", "last": "bar" } }, { "name": { 
"first": "ind", "last": "go`;

const fixJson = untruncateJson.default;

const json = fixJson(str);

console.log(json);
like image 187
Moniruzzaman Dipto Avatar answered Sep 17 '25 19:09

Moniruzzaman Dipto