Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Large CSV to JSON/Object in Node.js

I am trying to do something that seems like it should not only be fairly simple to accomplish but a common enough task that there would be straightforward packages available to do it. I wish to take a large CSV file (an export from a relational database table) and convert it to an array of JavaScript objects. Furthermore, I would like to export it to a .json file fixture.

Example CSV:

a,b,c,d
1,2,3,4
5,6,7,8
...

Desired JSON:

[
{"a": 1,"b": 2,"c": 3,"d": 4},
{"a": 5,"b": 6,"c": 7,"d": 8},
...
]

I've tried several node CSV parsers, streamers, self-proclaimed CSV-to-JSON libraries, but I can't seem to get the result I want, or if I can it only works if the files are smaller. My file is nearly 1 GB in size with ~40m rows (which would create 40m objects). I expect that it would require streaming the input and/or output to avoid memory problems.

Here are the packages I've tried:

  • https://github.com/klaemo/csv-stream
  • https://github.com/koles/ya-csv
  • https://github.com/davidgtonge/stream-convert (works but it so exceedingly slow as to be useless, since I alter the dataset often. It took nearly 3 hours to parse a 60 MB csv file)
  • https://github.com/cgiffard/CSVtoJSON.js
  • https://github.com/wdavidw/node-csv-parser (doesn't seem to be designed for converting csv to other formats)
  • https://github.com/voodootikigod/node-csv

I'm using Node 0.10.6 and would like a recommendation on how to easily accomplish this. Rolling my own might be best but I'm not sure where to begin with all of Node's streaming features, especially since they changed the API in 0.10.x.

like image 253
neverfox Avatar asked May 17 '13 20:05

neverfox


People also ask

Which is faster CSV or JSON?

CSV is easier to parse than JSON and is therefore potentially faster to write.

Can you convert CSV to JSON?

JSON requires the data to be in a structure or a schema, which are not compatible with the CSV file structure. CSV to JSON Converter tool is designed to convert CSV files into JSON format in a very easy manner.

Which is bigger JSON or CSV?

CSV format is about half the size of the JSON and another format file. It helps in reducing the bandwidth, and the size of the below would be very less.


4 Answers

Check node.js csvtojson module which can be used as a library, command line tools, or web server plugin. https://www.npmjs.org/package/csvtojson. the source code can be found at: https://github.com/Keyang/node-csvtojson

or install from NPM repo:

npm install -g csvtojson

It supports any size csv data / field type / nested json etc. A bunch of features.

Example

var Converter=require("csvtojson").core.Converter;

var csvConverter=new Converter({constructResult:false, toArrayString:true}); // The constructResult parameter=false will turn off final result construction in memory for stream feature. toArrayString will stream out a normal JSON array object.

var readStream=require("fs").createReadStream("inputData.csv"); 

var writeStream=require("fs").createWriteStream("outpuData.json");

readStream.pipe(csvConverter).pipe(writeStream);

You can also use it as a cli tool:

csvtojson myCSVFile.csv
like image 69
Keyang Avatar answered Sep 18 '22 20:09

Keyang


While this is far from a complete answer, you may be able to base your solution on https://github.com/dominictarr/event-stream . Adapted example from the readme:

    var es = require('event-stream')
    es.pipeline(                         //connect streams together with `pipe`
      process.openStdin(),              //open stdin
      es.split(),                       //split stream to break on newlines
      es.map(function (data, callback) { //turn this async function into a stream
        callback(null
          , JSON.stringify(parseCSVLine(data)))  // deal with one line of CSV data
      }), 
      process.stdout
      )

After that, I expect you have a bunch of stringified JSON objects on each line. This then needs to be converted to an array, which you may be able to do with and appending , to end of every line, removing it on the last, and then adding [ and ] to beginning and end of the file.

parseCSVLine function must be configured to assign the CSV values to the right object properties. This can be fairly easily done after passing the first line of the file.

I do notice the library is not tested on 0.10 (at least not with Travis), so beware. Maybe run npm test on the source yourself.

like image 44
Myrne Stol Avatar answered Sep 19 '22 20:09

Myrne Stol


I recommend implementing the logic yourself. Node.js is actually pretty good at these kinds of tasks.

The following solution is using streams since they won't blow up your memory.

Install Dependencies

npm install through2 split2 --save

Code

import through2 from 'through2'
import split2 from 'split2'

fs.createReadStream('<yourFilePath>')
  // Read line by line
  .pipe(split2())
  // Parse CSV line
  .pipe(parseCSV()) 
  // Process your Records
  .pipe(processRecord()) 

const parseCSV = () => {
  let templateKeys = []
  let parseHeadline = true
  return through2.obj((data, enc, cb) => {
    if (parseHeadline) {
      templateKeys = data
        .toString()
        .split(';')
      parseHeadline = false
      return cb(null, null)
    }
    const entries = data
      .toString()
      .split(';')
    const obj = {}
    templateKeys.forEach((el, index) => {
      obj[el] = entries[index]
    })
    return cb(null, obj)
  })
}

const processRecord = () => {
  return through2.obj(function (data, enc, cb) {
    // Implement your own processing 
    // logic here e.g.:
    MyDB
      .insert(data)
      .then(() => cb())
      .catch(cb)
  })
}

For more infos about this topic visit Stefan Baumgartners excellent tutorial on this topic.

like image 1
HaNdTriX Avatar answered Sep 21 '22 20:09

HaNdTriX


I found something more easier way to read csv data using csvtojson.

Here's the code:

var Converter = require("csvtojson").Converter;
var converter = new Converter({});
converter.fromFile("sample.csv",function(err,result){
  var csvData = JSON.stringify
  ([
    {resultdata : result[0]},
    {resultdata : result[1]},
    {resultdata : result[2]},
    {resultdata : result[3]},
    {resultdata : result[4]}
  ]);
  csvData = JSON.parse(csvData);
  console.log(csvData);
});

or you can easily do this:

var Converter = require("csvtojson").Converter;
var converter = new Converter({});
converter.fromFile("sample.csv",function(err,result){ 
  console.log(result);
});

Here's the result from the 1st code:

[ { resultdata: 
     { 'Header 1': 'A_1',
       'Header 2': 'B_1',
       'Header 3': 'C_1',
       'Header 4': 'D_1',
       'Header 5': 'E_1' } },
  { resultdata: 
     { 'Header 1': 'A_2',
       'Header 2': 'B_2',
       'Header 3': 'C_2',
       'Header 4': 'D_2',
       'Header 5': 'E_2' } },
  { resultdata: 
     { 'Header 1': 'A_3',
       'Header 2': 'B_3',
       'Header 3': 'C_3',
       'Header 4': 'D_3',
       'Header 5': 'E_3' } },
  { resultdata: 
     { 'Header 1': 'A_4',
       'Header 2': 'B_4',
       'Header 3': 'C_4',
       'Header 4': 'D_4',
       'Header 5': 'E_4' } },
  { resultdata: 
     { 'Header 1': 'A_5',
       'Header 2': 'B_5',
       'Header 3': 'C_5',
       'Header 4': 'D_5',
       'Header 5': 'E_5' } } ]

Source of this code is found in: https://www.npmjs.com/package/csvtojson#installation

I hope you got some idea.

like image 3
morz Avatar answered Sep 19 '22 20:09

morz