Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

efficiently serialise (and read) int array from nodejs

I'm considering building an application in nodejs which would need to stream large (>GB) files containing an array of integers. Crucially the array needs to be serialised optimally, so not ascii based, ideally using 8 bits for smaller integers (which would be the vast majority of the data) but still being able to represent larger numbers.

This question is maybe about more than nodejs, but how does one go about this in nodejs? Are there readily available solutions for streaming files with custom byte encodings from disk? Or better, integer arrays?

Ideally it should be possible for the decoding of each part of the stream to be disk bound rather than cpu bound, even with an ssd.

like image 765
Nat Avatar asked Nov 11 '22 17:11

Nat


1 Answers

I feel silly for not diving into the documentation first (the purpose of this project is for me to learn nodejs after all).

Turns out the default behaviour of the File System module looks up to the job. Though I haven't implemented the variable-length quantity decoding part or tested it for speed yet.

var fs, rs, bufferSize, buffer, i;
fs = require('fs');
rs = fs.createReadStream('/Path/to/file');
bufferSize = 10;

while(true){
  buffer = rs.read(bufferSize);

  if (!buffer) break;

  for(i=0; i<buffer.length; i++;){
    byte = buffer[i];
    // interpret byte given as integer according to 'variable-length quantity' encoding
  }
}

http://en.wikipedia.org/wiki/Variable-length_quantity

EDIT: I made a gist of the fully functioning script.

like image 118
Nat Avatar answered Nov 15 '22 06:11

Nat