Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Nodejs batch processing

A bit of conceptual question

I have 15 (for example) files that need to be processed. But i dont want to process them one at a time. Instead i want to start processing 5 of them (any 5 the order is not important) and as long one of these 5 files is processed another one to be started. The idea is to have max 5 files being processed at the same time until all files are processed.

Trying to work this out in Node but in general im missing the idea how this can be implemented

like image 464
Stefan Stoichev Avatar asked Dec 10 '22 14:12

Stefan Stoichev


2 Answers

A more accurate name for this type of processing might be 'limited parallel execution'. Mario Casciaro covers this well in his book, Node.js Design Patterns beginning on page 77. One use case for this pattern is when you want to control a set of parallel tasks that could cause excessive load. The example below is from his book.

Limited Parallel Execution Pattern

function TaskQueue(concurrency) {
  this.concurrency = concurrency;
  this.running = 0;
  this.queue = [];
}

TaskQueue.prototype.pushTask = function(task, callback) {
  this.queue.push(task);
  this.next();
}

TaskQueue.prototype.next = function() {
  var self = this;
  while(self.running < self.concurrency && self.queue.length) {
    var task = self.queue.shift();
    task(function(err) {
      self.running--;
      self.next();
    });
    self.running++;
  }
}
like image 67
KeyStroker Avatar answered Jan 01 '23 12:01

KeyStroker


Here's a little example that simulates multiple workers reading from a central queue of work: https://jsfiddle.net/ctrlfrk/jsvyg69h/1/

// Fake "work" that is simply a task that takes as many milliseconds as its value.
const workQueue = [1000,4000,2000,4000,5000,3000,7000,1000,9000,9000,4000,2000,1000,3000,8000,2000,3000,7000,6000,30000];


const Worker = (name) => (channel) => {
  const history = [];
  const next = () => {
    const job = channel.getWork();
    if (!job) { // All done!
      console.log('Worker ' + name + ' completed');
      return;
    }
    history.push(job);
    console.log('Worker ' + name + ' grabbed new job:' + job +'. History is:', history);

    window.setTimeout(next, job); //job is just the milliseconds.
  };
  next();
}

const Channel = (queue) => {
  return { getWork: () => {
    return queue.pop();
  }};
};

let channel = Channel(workQueue);
let a = Worker('a')(channel);
let b = Worker('b')(channel);
let c = Worker('c')(channel);
let d = Worker('d')(channel);
like image 42
david Avatar answered Jan 01 '23 12:01

david