Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Handle long-running processes in NodeJS?

I've seen some older posts touching on this topic but I wanted to know what the current, modern approach is.

The use case is: (1) assume you want to do a long running task on a video file, say 60 seconds long, say jspm install that can take up to 60 seconds. (2) you can NOT subdivide the task.

Other requirements include:

  • need to know when a task finishes
  • nice to be able to stop a running task
  • stability: if one task dies, it doesn't bring down the server
  • needs to be able to handle 100s of simultaneous requests

I've seen these solutions mentioned:

  • nodejs child process
  • webworkers
  • fibers - not used for CPU-bound tasks
  • generators - not used for CPU-bound tasks
  • https://adambom.github.io/parallel.js/
  • https://github.com/xk/node-threads-a-gogo
  • any others?

Which is the modern, standard-based approach? Also, if nodejs isn't suited for this type of task, then that's also a valid answer.

like image 395
U Avalos Avatar asked Oct 06 '15 16:10

U Avalos


People also ask

CAN NodeJS handle large applications?

NodeJS runs on JavaScript which runs on event loops which are not very efficient when used in bulk. NodeJS may be non-blocking, but all the requests are handled within a single thread so this can cause a bit of a bottleneck when many requests are handled.

How much traffic can NodeJS handle?

js can handle ~15K requests per second, and the vanilla HTTP module can handle 70K rps.

Is NodeJS good for CPU intensive?

Node. js provides developers a system with single-threaded event loop architecture that provides a non-blocking I/O mechanism. This works great until we get to CPU-intensive tasks. In this case, Node's performance isn't up to the mark.


2 Answers

The short answer is: Depends

If you mean a nodejs server, then the answer is no for this use case. Nodejs's single-thread event can't handle CPU-bound tasks, so it makes sense to outsource the work to another process or thread. However, for this use case where the CPU-bound task runs for a long time, it makes sense to find some way of queueing tasks... i.e., it makes sense to use a worker queue.

However, for this particular use case of running JS code (jspm API), it makes sense to use a worker queue that uses nodejs. Hence, the solution is: (1) use a nodejs server that does nothing but queue tasks in the worker queue. (2) use a nodejs worker queue (like kue) to do the actual work. Use cluster to spread the work across different CPUs. The result is a simple, single server that can handle hundreds of requests (w/o choking). (Well, almost, see the note below...)

Note:

  • the above solution uses processes. I did not investigate thread solutions because it seems that these have fallen out of favor for node.
  • the worker queue + cluster give you the equivalent of a thread pool.
  • yea, in the worst case, the 100th parallel request will take 25 minutes to complete on a 4-core machine. The solution is to spin up another worker queue server (if I'm not mistaken, with a db-backed worker queue like kue this is trivial---just make each point server point to the same db).
like image 127
U Avalos Avatar answered Sep 27 '22 05:09

U Avalos


You're mentioning a CPU-bound task, and a long-running one, that's definitely not a node.js thing. You also mention hundreds of simultaneous tasks.

You might take a look at something like Gearman job server for things like that - it's a dedicated solution.

Alternatively, you can still have Node.js manage the requests, just not do the actual job execution.

If it's relatively acceptable to have lower then optimal performance, and you want to keep your code in JavaScript, you can still do it, but you should have some sort of job queue - something like Redis or RabbitMQ comes to mind.

I think job queue will be a must-have requirement for long-running, hundreds/sec tasks, regardless of your runtime. Except if you can spawn this job on other servers/services/machines - then you don't care, your Node.js API is just a front and management layer for the job cluster, then Node.js is perfectly ok for the job, and you need to focus on that job cluster, and you could then make a better question.

Now, node.js can still be useful for you here, it can help manage and hold those hundreds of tasks, depending where they come from (ie. you might only allow requests to go through to your job server for certain users, or limit the "pause" functionality to others etc.

like image 45
Zlatko Avatar answered Sep 30 '22 05:09

Zlatko