After reading some code on Github, it seems like i misandurstood how the highWaterMark
concept works.
On a case of a writable stream which would write a big amount of data as fast as possible, here is the idea i had of the lifecycle:
1) While the highWaterMark
limit is not reached, the stream is able to buffer and write data.
2) If the highWaterMark
limit is reached, the stream cannot buffer anymore, so the #write method returns false to let you know that what you tried to write won't be write (never).
3) Once the stream emits a drain
event, it means that the buffer has been cleaned up, and you can write again from where you got "rejected".
It was clear and simple in my mind, but it looks like this is not exactly true (on the step 2), is the data you try to write really "rejected" when the #write method returns false ? Or is it buffered (or something else) ?
Sorry for the basic question but i need to be sure !
highWaterMark is just an indicator to communicate to upstream that no more data should be written/pushed. Obviously you can write more data and it will just continue to be buffered in memory. highWaterMark does not dictate the size of chunks passed to _transform() / _write() .
The highWaterMark option gives you some control on the amount of "buffer memory" used. Once you've written more than the amount specified, write will return false to give you an opportunity to stop writing.
2) If the highWaterMark limit is reached, the stream cannot buffer anymore, so the #write method returns false to let you know that what you tried to write won't be write (never).
This is false, data is still buffered, the stream doesn't lose it. But you should stop writing at this point. This is to allow backpressure to propagate.
Your question is addressed in the writable.write(chunk[, encoding][, callback])
docs:
This return value is strictly advisory. You MAY continue to write, even if it returns
false
. However, writes will be buffered in memory, so it is best not to do this excessively. Instead, wait for the'drain'
event before writing more data.
is the data you try to write really "rejected" when the #write method returns false ? Or is it buffered (or something else) ?
The data is buffered. However, excessive calls to write()
without allowing the buffer to drain will cause high memory usage, poor garbage collector performance, and could even cause Node.js to crash with an Allocation failed - JavaScript heap out of memory
error. See this related question:
Node: fs write() doesn't write inside loop. Why not?
For reference, here are some relevant details on highWaterMark
and backpressure from the current docs (v8.4.0):
writable.write()
The return value is
true
if the internal buffer is less than thehighWaterMark
configured when the stream was created after admittingchunk
. Iffalse
is returned, further attempts to write data to the stream should stop until the'drain'
event is emitted.While a stream is not draining, calls to
write()
will bufferchunk
, and returnfalse
. Once all currently buffered chunks are drained (accepted for delivery by the operating system), the'drain'
event will be emitted. It is recommended that oncewrite()
returnsfalse
, no more chunks be written until the'drain'
event is emitted. While callingwrite()
on a stream that is not draining is allowed, Node.js will buffer all written chunks until maximum memory usage occurs, at which point it will abort unconditionally. Even before it aborts, high memory usage will cause poor garbage collector performance and high RSS (which is not typically released back to the system, even after the memory is no longer required).
In any scenario where the data buffer has exceeded the
highWaterMark
or the write queue is currently busy,.write()
will returnfalse
.When a
false
value is returned, the backpressure system kicks in. It will pause the incomingReadable
stream from sending any data and wait until the consumer is ready again. Once the data buffer is emptied, a.drain()
event will be emitted and resume the incoming data flow.Once the queue is finished, backpressure will allow data to be sent again. The space in memory that was being used will free itself up and prepare for the next batch of data.
+-------------------+ +=================+
| Writable Stream +---------> .write(chunk) |
+-------------------+ +=======+=========+
|
+------------------v---------+
+-> if (!chunk) | Is this chunk too big? |
| emit .end(); | Is the queue busy? |
+-> else +-------+----------------+---+
| emit .write(); | |
^ +--v---+ +---v---+
^-----------------------------------< No | | Yes |
+------+ +---v---+
|
emit .pause(); +=================+ |
^-----------------------+ return false; <-----+---+
+=================+ |
|
when queue is empty +============+ |
^-----------------------< Buffering | |
| |============| |
+> emit .drain(); | ^Buffer^ | |
+> emit .resume(); +------------+ |
| ^Buffer^ | |
+------------+ add chunk to queue |
| <---^---------------------<
+============+
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With