Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Node.js/Express and parallel queues

We are building an infrastructure which features a Node.js server and Express.

In the server, what is happening is as follow:

  1. The server accepts an incoming HTTP request from client.
  2. Server generates two files (this operation can be "relatively long", meaning also 0.1 seconds or so)
  3. Server uploads the generated files (~20-200 KB each) to an external CDN
  4. Server responds to client, and this includes the URI of the file on the CDN

Currently the server is doing this sequentially for each request, and this works quite well (Node/Express can handle concurrent requests automatically). However, as we plan to grow, the number of concurrent requests may grow higher, and we believe it would be better for us to implement a queue for processing requests. Otherwise, we may risk having too many tasks running at the same time and too many open connections to the CDN. Responding to the client quickly is not a relevant thing.

What I was thinking about is to have a separate part in the Node server that contains a few "workers" (2-3, but we will do tests to determine the correct number of simultaneous operations). So, the new flow would look something like:

  1. After accepting the request from the client, the server adds an operation to a queue.
  2. There are 2-3 (to be tested) workers that take elements out of the queue and perform all the operations (generate the files and upload them to the CDN).
  3. When the worker has processed the operation (doesn't matter if it stays in the queue for a relatively long time), it notifies the Node server (a callback), and the server responds to the client (which has been waiting in the meanwhile).

What do you think of this approach? Do you believe it is the correct one?

Mostly important, HOW could this be implemented in Node/Express?

Thank you for your time

like image 753
ItalyPaleAle Avatar asked Feb 28 '14 22:02

ItalyPaleAle


People also ask

Does NodeJS support parallelism?

Node can support "Parallelism" via either the Cluster or child_process modules packaged in the Nodejs Core API. Both of these modules create additional processes and not additional threads.

Is NodeJS concurrent or parallel?

At a high level, Node. js falls into the category of concurrent computation. This is a direct result of the single-threaded event loop being the backbone of a Node. js application.

What is the difference between Express and NodeJS?

NodeJS is an event-driven, non-blocking I/O model using JavaScript as its main language. It helps to build scalable network applications. Express is a minimal and flexible Node. js web application framework that provides a robust set of features for web and mobile applications.

What is Express () in NodeJS?

Express is a node js web application framework that provides broad features for building web and mobile applications. It is used to build a single page, multipage, and hybrid web application. It's a layer built on the top of the Node js that helps manage servers and routes.


2 Answers

tldr; You can use the native Node.js cluster module to handle a lot of concurrent requests.

Some preamble: Node.js per se is single threaded. Its Event Loop is what makes it excellent for handling multiple requests simultaneosly even in its single thread model is, which is one of its best features IMO.

The real deal: So, how can we scale this to even handle more concurrent conections and use all CPUs available? With the cluster module.

This module will work exactly as pointed by @Qualcuno, which will allows you to create multiple workers (aka process) behind the master to share the load and use more efficiently the CPUs availables.

According with Node.js official documentation:

Because workers are all separate processes, they can be killed or re-spawned depending on your program's needs, without affecting other workers. As long as there are some workers still alive, the server will continue to accept connections.

The required example:

var cluster = require('cluster');
var http = require('http');
var numCPUs = require('os').cpus().length;

if (cluster.isMaster) {
  // Fork workers.
  for (var i = 0; i < numCPUs; i++) {
    cluster.fork();
  }

  cluster.on('exit', function(worker, code, signal) {
    console.log('worker ' + worker.process.pid + ' died');
  });
} else {
  // Workers can share any TCP connection
  // In this case its a HTTP server
  http.createServer(function(req, res) {
    res.writeHead(200);
    res.end("hello world\n");
  }).listen(8000);
}

Hope this is what you need.

Comment if you have any further questions.

like image 173
Diosney Avatar answered Sep 26 '22 23:09

Diosney


(Answering my own question)

According to this question on Stack Overflow a solution in my case would be to implement a queue using Caolan McMahon's async module.

The main application will create jobs and push them into a queue, which has a limit on the number of concurrent jobs that can run. This allows processing tasks concurrently but with a strict control on the limit. It works like Cocoa's NSOperationQueue on Mac OSX.

like image 22
ItalyPaleAle Avatar answered Sep 23 '22 23:09

ItalyPaleAle