How to load very large csv files in nodejs?

Question

I'm trying to load 2 big csv into nodejs, first one has a size of 257 597 ko and second one 104 330 ko. I'm using the filesystem (fs) and csv modules, here's my code :

fs.readFile('path/to/my/file.csv', (err, data) => {
  if (err) console.err(err)
  else {
    csv.parse(data, (err, dataParsed) => {
      if (err) console.err(err)
      else {
        myData = dataParsed
        console.log('csv loaded')
      }
    })
  }
})

And after ages (1-2 hours) it just crashes with this error message :

<--- Last few GCs --->

[1472:0000000000466170]  4366473 ms: Mark-sweep 3935.2 (4007.3) -> 3935.2 (4007.
3) MB, 5584.4 / 0.0 ms  last resort GC in old space requested
[1472:0000000000466170]  4371668 ms: Mark-sweep 3935.2 (4007.3) -> 3935.2 (4007.
3) MB, 5194.3 / 0.0 ms  last resort GC in old space requested


<--- JS stacktrace --->

==== JS stack trace =========================================

Security context: 000002BDF12254D9 <JSObject>
    1: stringSlice(aka stringSlice) [buffer.js:590] [bytecode=000000810336DC91 o
ffset=94](this=000003512FC822D1 <undefined>,buf=0000007C81D768B9 <Uint8Array map
 = 00000352A16C4D01>,encoding=000002BDF1235F21 <String[4]: utf8>,start=0,end=263
778854)
    2: toString [buffer.js:664] [bytecode=000000810336D8D9 offset=148](this=0000
007C81D768B9 <Uint8Array map = 00000352A16C4D01>,encoding=000002BDF1...

FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - JavaScript heap out of memo
ry
 1: node::DecodeWrite
 2: node_module_register
 3: v8::internal::FatalProcessOutOfMemory
 4: v8::internal::FatalProcessOutOfMemory
 5: v8::internal::Factory::NewRawTwoByteString
 6: v8::internal::Factory::NewStringFromUtf8
 7: v8::String::NewFromUtf8
 8: std::vector<v8::CpuProfileDeoptFrame,std::allocator<v8::CpuProfileDeoptFrame
> >::vector<v8::CpuProfileDeoptFrame,std::allocator<v8::CpuProfileDeoptFrame> >
 9: v8::internal::wasm::SignatureMap::Find
10: v8::internal::Builtins::CallableFor
11: v8::internal::Builtins::CallableFor
12: v8::internal::Builtins::CallableFor
13: 00000081634043C1

The biggest file is loaded but node runs out of memory for the other. It's probably easy to allocate more memory, but the main issue here is the loading time, it seems very long despite the size of files. So what is the correct way to do it? Python loads these csv really fast with pandas btw (3-5 seconds).

François MENTEC · Accepted Answer

Stream works perfectly, it took only 3-5 seconds :

var csv = require('csv-parser')
var data = []

fs.createReadStream('path/to/my/data.csv')
  .pipe(csv())
  .on('data', function (row) {
    data.push(row)
  })
  .on('end', function () {
    console.log('Data loaded')
  })

JacobW · Answer

fs.readFile will load the entire file into memory, but fs.createReadStream will read the file in chunks of the size you specify.

This will prevent it from running out of memory

Haroldo_OK · Answer

You may want to stream the CSV, instead of reading it all at once:

csv-parse has streaming support: http://csv.adaltas.com/parse/
or, you may want to take a look at csv-stream: https://www.npmjs.com/package/csv-stream

How to load very large csv files in nodejs?

Tags:

node.js

csv

François MENTEC

3 Answers

François MENTEC

JacobW

Haroldo_OK

Recent Activity

Donate For Us

How to load very large csv files in nodejs?

Tags:

node.js

csv

François MENTEC

3 Answers

François MENTEC

JacobW

Haroldo_OK

Related questions

Recent Activity

Donate For Us