Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can we set a total request timeout or drop requests in Node (when using the cluster module)?

This question seems innocent because we can read https://nodejs.org/api/http.html and spot the following options:

server.requestTimeout = 1;
server.keepAliveTimeout = 1;
server.timeout = 1;
server.setTimeout(1);

Easy, right? Not so fast!

The timeouts only start counting once the request is accepted by the request handler in a process. This is important because there's two places requests can queue up in node before it gets to that point:

  1. If we're using the cluster module, the main process has an in-memory request queue. If workers are currently blocked (e.g. doing template rendering), requests chill out in here before being popped off into workers
  2. If we're not using the cluster module, and the process is blocked, requests can sit in a lower level socket buffer

I can't seem to find any built-in APIs that allow us to set timeouts that begin counting when the request hits the socket (which I suppose makes sense, how would Node know when this happens?) Even if we tag requests with a TIME_SENT header or something that allows us to drop old requests "manually" in a middleware, we still have to wait some amount of time for the process to be unblocked to do so. Node can't do this for us out of band.

But maybe I'm wrong? Is there anything I'm missing?

maxConnections

We also have server.maxConnections: https://nodejs.org/api/net.html#net_server_maxconnections

Set this property to reject connections when the server's connection count gets high

The reason why we may want to set server.timeout in the first place is to stop the request queue from building up during peak times. Setting server.maxConnections should also achieve this!

With a single process, this setting does kiiiinda almost what we want.

Using a hello world server (https://i.fluffy.cc/FvGQHxMfKGScFcSkJBw1MRVL7x7hplB3.html), we can run:

for i in `seq 1 10`; do echo "${i}: "; curl localhost:8000 &; done 

Here's a script that does the same thing as the curl loop, but with fancier output:

$ node test.js 
┌─────────┬─────────┬─────────────────────────────────────────────────────┬─────────────┐
│ (index) │ success │                       output                        │  timeTaken  │
├─────────┼─────────┼─────────────────────────────────────────────────────┼─────────────┤
│    0    │  true   │     '[pid: 23628] Request seen at: 1615192813'      │ 1015.566695 │
│    1    │  true   │     '[pid: 23628] Request seen at: 1615192814'      │ 2012.38064  │
│    2    │  false  │ 'curl: (56) Recv failure: Connection reset by peer' │ 1011.044642 │
│    3    │  false  │ 'curl: (56) Recv failure: Connection reset by peer' │ 1009.696098 │
│    4    │  false  │ 'curl: (56) Recv failure: Connection reset by peer' │ 1007.128426 │
│    5    │  false  │ 'curl: (56) Recv failure: Connection reset by peer' │ 1005.725304 │
│    6    │  false  │ 'curl: (56) Recv failure: Connection reset by peer' │ 1004.444839 │
│    7    │  false  │ 'curl: (56) Recv failure: Connection reset by peer' │ 1003.128797 │
│    8    │  false  │ 'curl: (56) Recv failure: Connection reset by peer' │ 1001.813047 │
│    9    │  false  │ 'curl: (56) Recv failure: Connection reset by peer' │ 1000.523041 │
└─────────┴─────────┴─────────────────────────────────────────────────────┴─────────────┘

Results: 2 / 10 requests succeeded

...and indeed we see the value respected!

However, notice that we have to wait for the blocking operation to yield in order for node to actually terminate the other requests. We only get recv after ~1 second, rather than immediately. Not ideal, but not sure what else we could possibly do.

(Unless I'm missing something, and there is a magic out-of-band way to kill requests?)

Cluster Module

And what if we're tied into using the node cluster module? Perhaps we're using a library (e.g. hypernova)

Here's a clusterized hello world server.

Now we have one main process, and 2 workers each with server.maxConnections = 1

When we run the same set of parallel requests, we see the following output:

$ node test.js 
┌─────────┬─────────┬─────────────────────────────────────────────────────┬─────────────┐
│ (index) │ success │                       output                        │  timeTaken  │
├─────────┼─────────┼─────────────────────────────────────────────────────┼─────────────┤
│    0    │  true   │      '[pid: 6246] Request seen at: 1615162602'      │ 1012.041639 │
│    1    │  true   │      '[pid: 6247] Request seen at: 1615162602'      │ 1009.773912 │
│    2    │  false  │ 'curl: (56) Recv failure: Connection reset by peer' │ 1006.635283 │
│    3    │  false  │ 'curl: (56) Recv failure: Connection reset by peer' │ 1006.259198 │
│    4    │  true   │      '[pid: 6246] Request seen at: 1615162603'      │ 2001.560547 │
│    5    │  false  │ 'curl: (56) Recv failure: Connection reset by peer' │ 1001.442373 │
│    6    │  true   │      '[pid: 6247] Request seen at: 1615162603'      │ 2001.787769 │
│    7    │  false  │ 'curl: (56) Recv failure: Connection reset by peer' │ 1998.214606 │
│    8    │  false  │ 'curl: (56) Recv failure: Connection reset by peer' │ 1999.735996 │
│    9    │  false  │ 'curl: (56) Recv failure: Connection reset by peer' │ 998.881888  │
└─────────┴─────────┴─────────────────────────────────────────────────────┴─────────────┘

Results: 4 / 10 requests succeeded

Here's a visualization of this request timeline:

request timeline

Hold on! If we have 2 workers, each with server.maxConnection = 1, if everything worked the same, we should still only process 2 requests total... but we see 4 requests are processed! hmmm!

And unlike the single worker version, we have to wait for ~2 seconds in some cases for requests at the back of the queue to be dropped. Why? Ideally they'd similarly be dropped at 1s.

I'm not sure why we see [request] [drop] [drop] [request] [drop] rather than:

  • [request] [request] [drop] [drop] [drop] or even;
  • [request] [drop] [drop] [drop] [request]

Any idea what's going on here?

(FWIW, here's a little deep dive into how the requests queues appear to work at a lower level in the cluster module)

Addressing the XY Problem

  • "Why would you even need to want to do this? Just manually kill requests based on a request budget header!"

    Yep, doing this anyway, but the calling client/mesh still shouldn't have to want to wait a full timeout, so it could retry its request to a different host

  • "Why are your workers blocked? Don't do blocking work in node!"

    When server side rendering, ReactDOM.renderToString() is synchronous :P

This is expected behaviour

There's not so much a mystery as to why this is happening, more of a mystery about how we can work around this to be a "good mesh citizen"

Perhaps a new API exposed via Node, so the main process in the cluster module has access to the request queue?

Current solutions I can think of involve setting up a proxy to manually broker requests and manage the request queue, but this is icky and subverts some of the wins of using the cluster module in the first place.

like image 469
Mark Avatar asked Mar 08 '21 09:03

Mark


People also ask

How do I set HTTP request timeout?

Timeouts on http. request() takes a timeout option. Its documentation says: timeout <number> : A number specifying the socket timeout in milliseconds. This will set the timeout before the socket is connected.

What is request timeout in NodeJS?

By default, NodeJS has a timeout value of 120 seconds. Sometimes you may need to increase request timeout in NodeJS to process long running requests.

How multiple requests are handled by NodeJS?

How NodeJS handle multiple client requests? NodeJS receives multiple client requests and places them into EventQueue. NodeJS is built with the concept of event-driven architecture. NodeJS has its own EventLoop which is an infinite loop that receives requests and processes them.

How many requests node can handle?

Adding to slebetman answer: When you say Node. JS can handle 10,000 concurrent requests they are essentially non-blocking requests i.e. these requests are majorly pertaining to database query. Internally, event loop of Node.


1 Answers

You might check out the answer given here, which describes roughly what you are trying to achieve (I believe), as far as accessing connections from the master process. It relies on some manual setup of the http server in the master process, which allows for more customized distribution of requests (or in your case, more customized load handling).

That answer doesn't provide any specifics about using maxConnections or requestTimeout to handle load and/or fail early, but:

  • if you set those values (maxConnections/requestTimeout) on the master process' http/s server (where in theory you're not executing any long-running/blocking tasks)
  • and if you can keep track of the state of each worker (and their requests in progress) from the master

you might be able to achieve the behavior you're looking for. Assuming the above suggestion works, I can't speak to performance, as the logic and message passing may impose an unacceptable level of additional processing needed on each request, but that is speculating.

If that doesn't do the trick, it may be possible to modify the approach, but go to a lower level (maybe the tcp or net modules) and see if either of those expose a greater level of control that'd allow you to manage incoming requests the way that you want. Worst case, the proxy idea does sound like it'd work in a pinch.

like image 139
pdspicer Avatar answered Oct 23 '22 13:10

pdspicer