I am using nodejs for a CPU intensive task ,which basicly generates large amount of data and stores it in a file. I am streaming the data to output files as it is generated for a single type of data.
Aim : I want to make the task of generating this data for multiple types of data in parallel (utilizing my multi-core cpu to its best).Without each of process having its own heap memory .Thus providing with larger process memory and increased speed of execution.
I was planning to use node fibers which is also used by meteor js for its own callback handling.But I am not sure if this will achieve what I want,as in one of the video on meteor fibers by Chris Mather mentions at the end that eventually everything is single threaded and node fibers somehow manges the same single threaded event loop to provide its functionality.
So,
Does this mean that if I use node fibers I wont be running my task in parallel ,thus not utilizing my cpu cores ?
Does node webworker-threads will help me in achieving the functionality I desire.As is mentioned on modules home page which says that ,webworker threads will run on seperate/parallel cpu process ,thus providing multi-threading in real sense ??
As ending question ,Does this mean that node.js is not advisable for such CPU intensive tasks ?
note : I dont want to use asynchronous code structuring libs which are presented as threads,but infact just add syntatical sugar over same async code, as the tasks are largely CPU intensive .I have already used async capabilities to max .
// Update 1 (based on answer for clusters )
Sorry I forgot to mention this ,but problem with clusters I faced is :
Complex to load balance the amount of work I have in a way which makes sure a particular set of parallel tasks execute before certain other tasks.
Not sure if clusters really do what I want ,referring to these lines on webworker-threads npm homepage
The "can't block the event loop" problem is inherent to Node's evented model. No matter how many Node processes you have running as a Node-cluster, it won't solve its issues with CPU-bound tasks.
..... any light on how ..would be helpfull.
Rather than trying to implement multiple threads, you should find it much easier to use multiple processes with Node.js
See, for example, the cluster module. This allows you to easily run the same js code in multiple processes, e.g. one per core, and collect their results / be notified once they're completed.
If cluster does more than you need, then you can also just call fork directly.
If you must have thread-parallelism rather than process-, then you may want to look at writing an async native module. Then you have access to the libuv thread pool (though starving it may reduce I/O performance) or can fork your own threads as you wish (but then you're on your own for synchronising with the rest of Node).
After update 1
For load balancing, if what cluster does isn't working for you, then you can just do it yourself with fork, as I mentioned. The source for cluster is available.
For the other point, it means if the task is truly CPU-bound then there's no advantage Node will give you over other technologies, other than being simpler if everything else is using Node. The only option you have is to make sure you're using all the available CPU resources, which a worker pool will give you. If you're already using Node then the easiest options are using the ones it's already got (cluster or libuv). If they're not sufficient then yeah, you'll have to find something else.
Regardless of technology, it remains true that multi-process parallelism is a lot easier than multi-thread parallelism.
Note: despite what you say, you definitely do want to use async code precisely because it is CPI-intensive, otherwise your tasks will block all I/O. You do not want this to happen.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With