I have a Node app which accesses a static, large (>100M), complex, in-memory data structure, accepts queries, and then serves out little slices of that data to the client over HTTP.
Most queries can be answered in tenths of a second. Hurray for Node!
But, for certain queries, searching this data structure takes a few seconds. This sucks because everyone else has to wait.
To serve more clients efficiently, I would like to use some sort of parallelism.
But, because this data structure is so large, I would like to share it among the workers or threads or what have you, so I don't burn hundreds of megabytes. This would be perfectly safe, because the data structure is not going to be written to. A typical 'fork()' in any other language would do it.
However, as far as I can tell, all the standard ways of doing parallelism in Node explicitly make this impossible. For safety, they don't want you to share anything.
But is there a way?
Background:
It is impractical to put this data structure in a database, or use memcached, or anything like that.
WebWorker API libraries and similar only allow short serialized messages to be passed in and out of the workers.
Node's Cluster uses a call named 'fork', but it is not really a fork of the existing process, it is spawning a new one. So once again, no shared memory.
Probably the really correct answer would be to use filesystem-like access to shared memory, aka tmpfs, or mmap. There are some node libraries that make mount() and mmap() available for exactly something like this. Unfortunately then one has to implement complex data structure access on top of synchronous seeks and reads. My application uses arrays of arrays of dicts and so on. It would be nice to not have to reimplement all that.
I tried write a C/C++ binding of shared memory access from nodejs. https://github.com/supipd/node-shm
Still work in progress (but working for me), maybe usefull, if bug or suggestion, inform me.
Most web applications spend the majority of their life waiting for network buffers and database reads. Node.js is designed to excel at this io bound work. If your work is truly bound by the CPU, you might be served better by another platform.
With that out of the way...
Use process.nextTick (perhaps even nested blocks) to make sure that expensive CPU work is properly asynchronous and not allowed to block your thread. This will make sure one client making expensive requests doesn't negatively impact all the others.
Use node.js cluster to add a worker process for each CPU in the system. Worker processes can all bind to a single HTTP port and use Memcached or Redis to share memory state. Workers also have a messaging API that can be used to keep an in-process memory cache synchronized, however it has some consistency limitations.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With