Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Node.JS Unbounded Concurrency / Stream backpressure over TCP

As I understand it, one of the consequences of Node's evented IO model is the inability to tell a Node process that is (for example) receiving data over a TCP socket, to block, once you've hooked up your receiving event handlers (or otherwise started listening for data).

If the receiver can't process the incoming data fast enough, "unbounded concurrency" can result, whereby node under-the-hood continues to read data off the socket as fast as it can, scheduling new data events on the event loop instead of block on the socket, until the process eventually runs out of memory and dies.

The receiver can't tell node to slow its reading, which would otherwise allow TCP's inbuilt flow control mechanisms to kick in and indicate to the sender that it needs to slow down.

Firstly, is what I've described so far accurate? Is there something I've missed that allows node to avoid this situation?

One of the much touted features of Node Streams is the automatic handling of backpressure.

AFAIK, the only way a writable stream (of a tcp socket) can tell if it needs to slow down or not is by looking at socket.bufferSize (indicating the amount of data written to the socket but not yet sent). Given that Node at the receiving end always reads as fast as it can, this can only indicate a slow network connection between sender and receiver, and NOT whether the receiver can't keep up.

So secondly, can Node Streams automatic backpressure somehow work in this situation to deal with a receiver that can't keep up?

It also seems that this problem affects browsers receiving data via websockets, for the similar reason that the websockets API doesn't provide a mechanism to tell the browser to slow its reading from the socket.

Is the only solution to this problem for Node (and browsers using websockets) to implement a manual flow control mechanism at the application level, to explicitly tell the sending process to slow down?

like image 376
MikeL Avatar asked Aug 11 '14 06:08

MikeL


People also ask

What function can be used to automatically handle back pressure on streams?

The pipe function helps to set up the appropriate backpressure closures for the event triggers. In Node. js the source is a Readable stream and the consumer is the Writable stream (both of these may be interchanged with a Duplex or a Transform stream, but that is out-of-scope for this guide).

What is highWaterMark Nodejs?

The highWaterMark option gives you some control on the amount of "buffer memory" used. Once you've written more than the amount specified, write will return false to give you an opportunity to stop writing.

What is stream PassThrough ()?

PassThrough. This Stream is a trivial implementation of a Transform stream that simply passes the input bytes across to the output. This is mainly for testing and some other trivial use cases. Here is an example of Passthrough Stream where it is piping from readable stream to writable stream.

What is Libuv and how does node js use it?

libuv: libuv is a C library originally written for Node. js to abstract non-blocking I/O operations. Event-driven asynchronous I/O model is integrated. It allows the CPU and other resources to be used simultaneously while still performing I/O operations, thereby resulting in efficient use of resources and network.


1 Answers

To answer your first question, I believe your understanding is not accurate -- at least not when piping data between streams. In fact, if you read the documentation for the pipe() function you'll see that it explicitly says that it automatically manages the flow so that "destination is not overwhelmed by a fast readable stream."

The underlying implementation of pipe() is taking care of all of the heavy lifting for you. The input stream (a Readable stream) will continue to emit data events until the output stream (a Writable stream) is full. As an aside, if I remember correctly, the stream will return false when you attempt to write data that it cannot currently process. At this point, the pipe will pause() the Readable stream, which will prevent it from emitting further data events. Thus, the event loop isn't going to fill up and exhaust your memory nor is it going to emit events that are simply lost. Instead, the Readable will stay paused until the Writable stream emits a drain event. At that point, the pipe will resume() the Readable stream.

The secret sauce is piping one stream into another, which is managing the back pressure for you automatically. This hopefully answers your second question, which is that Node can and does automatically manage this by simply piping streams.

And finally, there is really no need to implement this manually (unless you are writing a new stream from scratch) since it is already provided for you. :)

Handling all of this is not easy, as admitted on the Node blog post that announced the streams2 API in Node. It's a great resource and certainly provides much more information than I could here. One little gotcha that isn't entirely obvious that you should know however, from the docs here and for backwards compatibility reasons:

If you attach a data event listener, then it will switch the stream into flowing mode, and data will be passed to your handler as soon as it is available.

So just be aware that attaching the data event listener in an attempt to observe something in the stream will fundamentally alter the stream to the old way of doing things. Ask me how I know.

like image 126
twofifty6 Avatar answered Sep 21 '22 21:09

twofifty6