I am working on a NodeJS application. There is a specific RESTful API (GET) that, when triggered by the user, it requires the server to do about 10-20 network operations to pull information from different sources. All these network operations are async callbacks, and once they ALL are finished, the result is consolidated by the nodejs app and sent back to the client. All these operations are started in parallel via async.map function.
I just want to understand, since nodejs is single threaded, and it does not make use of multi-core machines (at least not without clustering), how does node scale when it has many callbacks to process? Does the actual processing of callbacks depend on node's single thread being idle, or are callbacks processed in parallel as well as the main thread?
The reason why I ask is, I see the performance of my 20 callbacks deteriorate from the first callback to the last one. For example, the first network operation (out of the 10-20) takes 141ms to complete, whereas the last one takes about 4 seconds (measured as the time from when the function is executed, until the callback of the function returns a value or an error). They are all the same network operation hitting the same data source, so the data source is not the bottleneck). I know for a fact that the data source takes no more than 200ms to respond to a single request.
I found this thread, so it looks to me that the one single thread needs to address all callbacks AND new requests coming up.
So my question is, for operations that will trigger many callbacks, what is the best practice in optimizing their performance?
EventEmitter.defaultMaxListeners By default, a maximum of 10 listeners can be registered for any single event.
So from my findings i assure you ES6 promises are faster and recommended than old callbacks.
js can handle ~15K requests per second, and the vanilla HTTP module can handle 70K rps.
Single-threaded structure NodeJS is a single-threaded and asynchronous programming language. Any input/output process does not halt work. This means you can read files, send emails, query a database, and do other things simultaneously. Every request does not start a new NodeJS process.
In order to understand most runtime performance problems in Nodejs, you must understand a core principle – how the Nodejs event loop works and where exactly it might slow down parts of the process. Node.js is the most popular web applications server these days for a few reasons.
Node.js knows it has a pending request from these API, so it awaits to see when it returns. Once the 100ms pass, the timers API adds the callback to the tasks queue. The tasks queue then runs the callback, sees it has nothing pending and quits.
Most optimizations can be done using well known optimization design patterns. In order to understand most runtime performance problems in Nodejs, you must understand a core principle – how the Nodejs event loop works and where exactly it might slow down parts of the process.
To scale the Node.js app on a multicore server, you can use the introduced cluster module, which spawns new processes called workers (one for each CPU core) that all run simultaneously and connect to a single master process, allowing the processes to share the same server port. In that way, it behaves like one big, multithreaded Node.js server.
For network operations node.js is effectively single threaded. However there is a persistent misunderstanding that handling I/O requires constant CPU resource. The core of your question boil down to:
Does the actual processing of callbacks depend on node's single thread being idle, or are callbacks processed in parallel as well as the main thread?
The answer is yes and no. Yes, callbacks are only executed when the main thread is idle. No, the "processing" is not done when thread is idle. To be specific: there is no "processing" - it takes zero CPU time for node to "process" thousands of callbacks if what you mean by "process" is waiting.
If we really need to understand how node (or browser) internals work we must unfortunately first understand how computers work - from the hardware to the operating system. Yes, this is going to be a deep dive so bear with me..
It all began with the invention of interrupts..
It was a great invention, but also a Box of Pandora - Edsger Dijkstra
Yes, the quote above is from the same "Goto considered harmful" Dijkstra. From the very beginning introducing asynchronous operation to computer hardware was considered a very hard topic even for some of the legends in the industry.
Interrupts was introduced to speed up I/O operations. Rather than needing to poll some input with software (taking CPU time away from useful work) the hardware will send a signal to the CPU to tell it an event has occurred. The CPU will then suspend the currently running program and execute another program to handle the interrupt - thus we call these functions interrupt handlers. And the word "handler" has stuck all the way up the stack to GUI libraries which call callback functions "event handlers".
If you've been paying attention you will notice that this concept of an interrupt handler is actually a callback. You configure the CPU to call a function at some later time when an event happens. So even callbacks are not a new concept - it's way older than C.
Interrupts make modern operating systems possible. Without interrupts there would be no way for the CPU to temporarily stop your program to run the OS (well, there is cooperative multitasking, but let's ignore that for now). How an OS works is that it sets up a hardware timer in the CPU to trigger an interrupt and then it tells the CPU to execute your program. It is this periodic timer interrupt that runs your OS. Apart form the timer, the OS (or rather device drivers) sets up interrupts for I/O. When an I/O event happens the OS will take over your CPU (or one of your CPU in a multi-core system) and checks against its data structure which process it needs to execute next to handle the I/O (this is called preemptive multitasking).
So, handling network connections is not even the job of the OS - the OS just keeps track of connections in it's data structures (or rather, the networking stack). What really handles network I/O is your network card, your router, your modem, your ISP etc. So waiting for I/O takes zero CPU resources. It just takes up some RAM to remember which program owns which socket.
Now that we have a clear picture of this we can understand what it is that node does. Various OSes have various different APIs that provide asynchronous I/O - from overlapped I/O on Windows to poll/epoll on Linux to kqueue on BSD to the cross-platform select()
. Node internally uses libuv as a high-level abstraction over these APIs.
How these APIs work are similar though the details differ. Essentially they provide a function that when called will block your thread until the OS sends an event to it. So yes, even non-blocking I/O blocks your thread. The key here is that blocking I/O will block your thread in multiple places but non-blocking I/O blocks your thread in only one place - where you wait for events.
What this allows you to do is design your program in an event-oriented manner. This is similar to how interrupts allow OS designers to implement multitasking. In effect, asynchronous I/O is to frameworks what interrupts are to OSes. It allows node to spend exactly 0% CPU time to process (wait for) I/O. This is what makes node fast - it's not really faster but does not waste time waiting.
With the understanding we now have of how node handles network I/O we can understand how callbacks affect performance.
There is zero CPU penalty having thousands of callbacks waiting
Of course, node still needs to maintain data structures in RAM to keep track of all the callbacks so callbacks do have memory penalty.
Processing the return value from callbacks is done in a single thread
This has some advantages and some drawbacks. It means node does not have to worry about race conditions and thus node does not internally use any semaphores or mutexes to guard data access. The disadvantage is that any CPU intensive javascript will block all other operations.
You mention that:
I see the performance of my 20 callbacks deteriorate from the first callback to the last one
The callbacks are all executed sequentially and synchronously in the main thread (only the waiting is actually done in parallel). Thus it could be that your callback is doing some CPU intensive calculations and the total execution time of all callbacks is actually 4 seconds.
However, I rarely see this kind of issue for that number of callbacks. It's still possible, I still don't know what you're doing in your callbacks. I just think it's unlikely.
You also mention:
until the callback of the function returns a value or an error
One likely explanation is that your network resource cannot handle that many simultaneous connections. You may not think it's much since it's only 20 connections but I've seen plenty of services that would crash at 10 requests/second. The problem is all 20 requests are simultaneous.
You can test this by taking node out of the picture and use a command line tool to send 20 simultaneous requests. Something like curl
or wget
:
# assuming you're running bash:
for x in `seq 1 20`;do curl -o /dev/null -w "Connect: %{time_connect} Start: %{time_starttransfer} Total: %{time_total} \n" http://example.com & done
If it turns out that the issue is doing the 20 requests simultaneously is stressing the other service what you can do is limit the number of simultaneous requests.
You can do this by batching your requests:
async function () {
let input = [/* some values we need to process */];
let result = [];
while (input.length) {
let batch = input.splice(0,3); // make 3 requests in parallel
let batchResult = await Promise.all(batch.map(x => {
return fetchNetworkResource(x);
}));
result = result.concat(batchResult);
}
return result;
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With