Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Bufferizing data from stream in nodeJS for perfoming bulk insert

How to bufferize efficiently in nodeJS on events from a stream to bulk insert instead of unique insert per record received from the stream. Here's pseudo code I've got in mind:

// Open MongoDB connection

mystream.on('data', (record) => {
   // bufferize data into an array
   // if the buffer is full (1000 records)
   // bulk insert into MongoDB and empty buffer
})

mystream.on('end', () => {
   // close connection
})

Does this look realistic? Is there any possible optimization? Existing libraries facilitaties that?

like image 949
dbrrt Avatar asked Nov 06 '25 11:11

dbrrt


1 Answers

Using NodeJS' stream library, this can be concisely and efficiently implemented as:

const stream = require('stream');
const util = require('util');
const mongo = require('mongo');

const streamSource; // A stream of objects from somewhere

// Establish DB connection
const client = new mongo.MongoClient("uri");
await client.connect();

// The specific collection to store our documents
const collection = client.db("my_db").collection("my_collection");

await util.promisify(stream.pipeline)( 
  streamSource, 
  stream.Writable({
    objectMode: true,
    highWaterMark: 1000,
    writev: async (chunks, next) => {
      try {
        const documents = chunks.map(({chunk}) => chunk);
        
        await collection.insertMany(docs, {ordered: false});

        next();
      }
      catch( error ){
        next( error );
      }
    }
  })
);
like image 117
jorgenkg Avatar answered Nov 09 '25 08:11

jorgenkg