How can we set a total request timeout or drop requests in Node (when using the cluster module)?

Tags:

This question seems innocent because we can read https://nodejs.org/api/http.html and spot the following options:

server.requestTimeout = 1;
server.keepAliveTimeout = 1;
server.timeout = 1;
server.setTimeout(1);

Easy, right? Not so fast!

The timeouts only start counting once the request is accepted by the request handler in a process. This is important because there's two places requests can queue up in node before it gets to that point:

If we're using the cluster module, the main process has an in-memory request queue. If workers are currently blocked (e.g. doing template rendering), requests chill out in here before being popped off into workers
If we're not using the cluster module, and the process is blocked, requests can sit in a lower level socket buffer

I can't seem to find any built-in APIs that allow us to set timeouts that begin counting when the request hits the socket (which I suppose makes sense, how would Node know when this happens?) Even if we tag requests with a TIME_SENT header or something that allows us to drop old requests "manually" in a middleware, we still have to wait some amount of time for the process to be unblocked to do so. Node can't do this for us out of band.

But maybe I'm wrong? Is there anything I'm missing?

maxConnections

We also have server.maxConnections: https://nodejs.org/api/net.html#net_server_maxconnections

Set this property to reject connections when the server's connection count gets high

The reason why we may want to set server.timeout in the first place is to stop the request queue from building up during peak times. Setting server.maxConnections should also achieve this!

With a single process, this setting does kiiiinda almost what we want.

Using a hello world server (https://i.fluffy.cc/FvGQHxMfKGScFcSkJBw1MRVL7x7hplB3.html), we can run:

for i in `seq 1 10`; do echo "${i}: "; curl localhost:8000 &; done

Here's a script that does the same thing as the curl loop, but with fancier output:

$ node test.js 
┌─────────┬─────────┬─────────────────────────────────────────────────────┬─────────────┐
│ (index) │ success │                       output                        │  timeTaken  │
├─────────┼─────────┼─────────────────────────────────────────────────────┼─────────────┤
│    0    │  true   │     '[pid: 23628] Request seen at: 1615192813'      │ 1015.566695 │
│    1    │  true   │     '[pid: 23628] Request seen at: 1615192814'      │ 2012.38064  │
│    2    │  false  │ 'curl: (56) Recv failure: Connection reset by peer' │ 1011.044642 │
│    3    │  false  │ 'curl: (56) Recv failure: Connection reset by peer' │ 1009.696098 │
│    4    │  false  │ 'curl: (56) Recv failure: Connection reset by peer' │ 1007.128426 │
│    5    │  false  │ 'curl: (56) Recv failure: Connection reset by peer' │ 1005.725304 │
│    6    │  false  │ 'curl: (56) Recv failure: Connection reset by peer' │ 1004.444839 │
│    7    │  false  │ 'curl: (56) Recv failure: Connection reset by peer' │ 1003.128797 │
│    8    │  false  │ 'curl: (56) Recv failure: Connection reset by peer' │ 1001.813047 │
│    9    │  false  │ 'curl: (56) Recv failure: Connection reset by peer' │ 1000.523041 │
└─────────┴─────────┴─────────────────────────────────────────────────────┴─────────────┘

Results: 2 / 10 requests succeeded

...and indeed we see the value respected!

However, notice that we have to wait for the blocking operation to yield in order for node to actually terminate the other requests. We only get recv after ~1 second, rather than immediately. Not ideal, but not sure what else we could possibly do.

(Unless I'm missing something, and there is a magic out-of-band way to kill requests?)

Cluster Module

And what if we're tied into using the node cluster module? Perhaps we're using a library (e.g. hypernova)

Here's a clusterized hello world server.

Now we have one main process, and 2 workers each with server.maxConnections = 1

When we run the same set of parallel requests, we see the following output:

$ node test.js 
┌─────────┬─────────┬─────────────────────────────────────────────────────┬─────────────┐
│ (index) │ success │                       output                        │  timeTaken  │
├─────────┼─────────┼─────────────────────────────────────────────────────┼─────────────┤
│    0    │  true   │      '[pid: 6246] Request seen at: 1615162602'      │ 1012.041639 │
│    1    │  true   │      '[pid: 6247] Request seen at: 1615162602'      │ 1009.773912 │
│    2    │  false  │ 'curl: (56) Recv failure: Connection reset by peer' │ 1006.635283 │
│    3    │  false  │ 'curl: (56) Recv failure: Connection reset by peer' │ 1006.259198 │
│    4    │  true   │      '[pid: 6246] Request seen at: 1615162603'      │ 2001.560547 │
│    5    │  false  │ 'curl: (56) Recv failure: Connection reset by peer' │ 1001.442373 │
│    6    │  true   │      '[pid: 6247] Request seen at: 1615162603'      │ 2001.787769 │
│    7    │  false  │ 'curl: (56) Recv failure: Connection reset by peer' │ 1998.214606 │
│    8    │  false  │ 'curl: (56) Recv failure: Connection reset by peer' │ 1999.735996 │
│    9    │  false  │ 'curl: (56) Recv failure: Connection reset by peer' │ 998.881888  │
└─────────┴─────────┴─────────────────────────────────────────────────────┴─────────────┘

Results: 4 / 10 requests succeeded

Here's a visualization of this request timeline:

request timeline

Hold on! If we have 2 workers, each with server.maxConnection = 1, if everything worked the same, we should still only process 2 requests total... but we see 4 requests are processed! hmmm!

And unlike the single worker version, we have to wait for ~2 seconds in some cases for requests at the back of the queue to be dropped. Why? Ideally they'd similarly be dropped at 1s.

I'm not sure why we see [request] [drop] [drop] [request] [drop] rather than:

[request] [request] [drop] [drop] [drop] or even;
[request] [drop] [drop] [drop] [request]

Any idea what's going on here?

(FWIW, here's a little deep dive into how the requests queues appear to work at a lower level in the cluster module)

Addressing the XY Problem

"Why would you even need to want to do this? Just manually kill requests based on a request budget header!"

Yep, doing this anyway, but the calling client/mesh still shouldn't have to want to wait a full timeout, so it could retry its request to a different host
"Why are your workers blocked? Don't do blocking work in node!"

When server side rendering, ReactDOM.renderToString() is synchronous :P

This is expected behaviour

There's not so much a mystery as to why this is happening, more of a mystery about how we can work around this to be a "good mesh citizen"

Perhaps a new API exposed via Node, so the main process in the cluster module has access to the request queue?

Current solutions I can think of involve setting up a proxy to manually broker requests and manage the request queue, but this is icky and subverts some of the wins of using the cluster module in the first place.

469

asked Mar 08 '21 09:03

Mark

1 Answers

You might check out the answer given here, which describes roughly what you are trying to achieve (I believe), as far as accessing connections from the master process. It relies on some manual setup of the http server in the master process, which allows for more customized distribution of requests (or in your case, more customized load handling).

That answer doesn't provide any specifics about using maxConnections or requestTimeout to handle load and/or fail early, but:

if you set those values (maxConnections/requestTimeout) on the master process' http/s server (where in theory you're not executing any long-running/blocking tasks)
and if you can keep track of the state of each worker (and their requests in progress) from the master

you might be able to achieve the behavior you're looking for. Assuming the above suggestion works, I can't speak to performance, as the logic and message passing may impose an unacceptable level of additional processing needed on each request, but that is speculating.

If that doesn't do the trick, it may be possible to modify the approach, but go to a lower level (maybe the tcp or net modules) and see if either of those expose a greater level of control that'd allow you to manage incoming requests the way that you want. Worst case, the proxy idea does sound like it'd work in a pinch.

139

answered Oct 23 '22 13:10

pdspicer

Related questions
                            
                                Windows mimicking portfolio
                            
                                Kube modal closes when pressing enter in form
                            
                                Passing variable into regex In Gatsby graphql query
                            
                                How to convert the string type from an API response to an image file - ����\u0000\u0010JFIF\u0000\u0001\u0001\u0000\u0000\u0001 -
                            
                                How to declare Flow types for a function which has fields?
                            
                                Vuex clone object for local editing
                            
                                Convert Webpack UMD to Native ES Module
                            
                                How to save references to Firebase Auth Users in a Firestore document?
                            
                                Why is raycast direction calculated incorrectly in WebXR?
                            
                                How to access Javascript module with Duktape in Android
                            
                                draw an arrow head HummusJS
                            
                                Persistent local storage in iOS Safari issues
                            
                                Elements switch with mouse wheel
                            
                                Calculate height of div based off width & maintain proportion [duplicate]
                            
                                Inject per-component style tags dynamically with Rollup and scss
                            
                                Exported function is undefined in bundle.js after webpack build
                            
                                chrome extension says CRX_REQUIRED_PROOF_MISSING while installing
                            
                                How to Detect the device level Notification ON or OFF using javascript for PWA Application?
                            
                                Passing react-router-dom's Link into external library
                            
                                How can I make a React "If" component that acts like a real "if" in Typescript?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With