I run a node server and have the following code: <pre class="prettyprint"><code>var readable = fs.createReadStream(__dirname + '/greet.txt', {encoding: 'utf8', highWaterMark: 332 * 1024}); </code></pre> greet.txt: <pre class="prettyprint"><code>hello </code></pre> I am having trouble understanding readable stream and writeable stream; In my code above, I have a readable stream which reads from greet.txt - chunks enter the buffer and I can see the binary data... the issue is, shouldn't there be a writable stream which sends data to my buffer on the other side? how binary data start flying all of the sudden into my buffer, It's just not clear. Here is a combination of readable and writable: <pre class="prettyprint"><code>var readable = fs.createReadStream(__dirname + '/greet.txt', {encoding: 'utf8', highWaterMark: 332 * 1024}); var writeable = fs.createWriteStream(__dirname + '/greetcopy.txt'); readable.on('data', function(chunk){ writeable.write(chunk); }); </code></pre> as a chunk arrives to readable buffer, and being sent to writable stream's buffer through an event, shouldn't the writable stream be readable too in order to receive the data? and once the writable stream's buffer gets the info from the readable and sends it to greetcopy.txt file(which is empty), how does the data arrive? the concept of readable and writable in node are over-simplified and I just have hard time grasping them. Thank you for your time, I'd like some info on what's going on behind the scenes...

Node.js streams are extremely convoluted and confusing. I spent a large amount of time trying to understand them, and I'll try to convey my findings below. There are 5 types, Readable, Writable, Duplex, Transform and PassThrough. Ok, the easy part first: Readable and Writable Readable <ul> <li>To add data to a readable stream, you use the .push() function. When the stream is finished, you push(null). </li> <li>When ended, readable streams fire the 'end' event. </li> <li>You can read data from a readable stream by listening for the 'readable' event and then executing 'read()' until it returns null. </li> <li>Readable streams have a buffer, meaning that when you 'push()' to a buffer, if the buffer is full, then push() will return false. However, you can continue pushing to the buffer and filling it even though it's full. The 'highWaterMark' (or the buffer size) is really informational. </li> <li>Readable streams implement a _read() method to pull data from a non-stream source. You don't have to use this, however. You can just leave this method blank and use the push method described earlier. Whoever is using your stream could call read(), which would first read from the internal buffer, and then go call _read() when the buffer is empty.</li> </ul> Writable <ul> <li>To add data to a writable stream, you use the .write() function. When the stream is finished, you use .end(). </li> <li>When you call .end(), it does NOT end the stream immediately. It will use process.nextTick() to end the stream on the next tick! This has caused many race condition heartaches for me.</li> <li>Writable streams have a buffer. If the buffer is full (highWaterMark), then it will return false when you call .write(). However, you can keep writing to it and ignore this event if you want. Otherwise, I think there's something like a 'drain' event that notifies you that you can continue writing.</li> <li>Writable streams implement a _write() method to send the data to some back-end non-stream sink. If this method returns false, then the Writable stream will start buffering data and not call _write() again until 'drain'.</li> </ul> Using Readable and Writable Streams together <ul> <li>You can pipe one readable stream to ONE writable stream ONLY. This might confuse you, since you might have seen syntax like 'streamA.pipe(streamB).pipe(streamC)'... etc. The fact of the matter is, the only Readable stream in this example is streamA. The only Writable stream is streamC. streamB (and any other streams in between) is a special kind of stream called a Transform stream.</li> <li> Key Point 1: You cannot pipe to a readable stream. Everything must start at a Readable stream.</li> <li> Key Point 2: You cannot pipe a writable stream to anything else. A writable stream is where it ends. The data must exit a writable stream via the _write method().</li> </ul> The only way to have streams piped to each other is to use transform streams. With me so far? Here is where it gets extremely confusing: Duplex, Transform and PassThrough Duplex <ul> <li>A duplex stream is a readable and writable stream combined. When you pipe a duplex stream (or read from a duplex stream), it operates as a Readable stream. When you pipe to a duplex stream, it operates the exact way a Writable stream does.</li> <li> Key Point 1: The example 'streamA.pipe(duplexB).pipe(streamC)' means that data is read from Readable streamA's _read() method and sent to duplexB's _write() method. It does NOT go to streamC. It also means that data read from duplexB's _read() method goes to streamC. The syntax is confusing because it looks like the data is going in a line from streamA to streamC.</li> <li> Key Point 2: It is super confusing when using duplex streams whether to call .push(null) or .end() to end the stream. It's also super confusing whether you should listen to the 'end' or 'finish' event. I still don't have an answer to this. Does calling end() implicitly do a .push(null)?</li> </ul> Both of these key points make using Duplex streams extremely confusing. In fact, I wanted a bi-directional stream that worked exactly as above, so I created my own here. I call it the 'link-stream', and it doesn't actually use the _read or _write methods. It takes data from streamA and pipes it to streamC and vice versa in full duplex mode, and you can listen on the 'finish' or 'end' event, it doesn't matter. It's a true bi-directional passthrough pipe. Transform <ul> <li>A Transform stream is a Duplex stream</li> <li>Calling write() on a transform stream calls _write under the covers, which just calls _read()</li> <li>Calling this.push(...) on a transform stream calls _read under the covers, which calls _transform()</li> <li>Basically all data paths lead to the _transform() method. You implement the _transform method. No matter how you use the stream, it can act as both a readable or writable, and the data always goes to the same place, the _transform() method</li> <li>Once the _transform method is called, the data is sent to any writable stream it is piped to.</li> </ul> PassThrough <ul> <li>This is just a Transform stream that does nothing in the _transform method.</li> </ul> So there you have it. I'd really hope that the Joyent folks clean up Duplex and make it less confusing, and I really hope they add a bi-directional PassThrough, so I don't have to use my link-stream method I described above. Good Luck!

Nodejs - streaming readable & writable misunderstood

Tags:

javascript

stream

node.js

I run a node server and have the following code:

var readable = fs.createReadStream(__dirname + '/greet.txt',
{encoding: 'utf8', highWaterMark: 332 * 1024});

greet.txt:

hello

I am having trouble understanding readable stream and writeable stream; In my code above, I have a readable stream which reads from greet.txt - chunks enter the buffer and I can see the binary data... the issue is, shouldn't there be a writable stream which sends data to my buffer on the other side? how binary data start flying all of the sudden into my buffer, It's just not clear.

Here is a combination of readable and writable:

var readable = fs.createReadStream(__dirname + '/greet.txt',
{encoding: 'utf8', highWaterMark: 332 * 1024});

var writeable = fs.createWriteStream(__dirname + '/greetcopy.txt');

readable.on('data', function(chunk){
writeable.write(chunk);
});

as a chunk arrives to readable buffer, and being sent to writable stream's buffer through an event, shouldn't the writable stream be readable too in order to receive the data? and once the writable stream's buffer gets the info from the readable and sends it to greetcopy.txt file(which is empty), how does the data arrive?

the concept of readable and writable in node are over-simplified and I just have hard time grasping them. Thank you for your time, I'd like some info on what's going on behind the scenes...

594

asked Feb 16 '16 16:02

RunningFromShia

1 Answers

Node.js streams are extremely convoluted and confusing. I spent a large amount of time trying to understand them, and I'll try to convey my findings below.

There are 5 types, Readable, Writable, Duplex, Transform and PassThrough.

Ok, the easy part first: Readable and Writable

Readable

To add data to a readable stream, you use the .push() function. When the stream is finished, you push(null).
When ended, readable streams fire the 'end' event.
You can read data from a readable stream by listening for the 'readable' event and then executing 'read()' until it returns null.
Readable streams have a buffer, meaning that when you 'push()' to a buffer, if the buffer is full, then push() will return false. However, you can continue pushing to the buffer and filling it even though it's full. The 'highWaterMark' (or the buffer size) is really informational.
Readable streams implement a _read() method to pull data from a non-stream source. You don't have to use this, however. You can just leave this method blank and use the push method described earlier. Whoever is using your stream could call read(), which would first read from the internal buffer, and then go call _read() when the buffer is empty.

Writable

To add data to a writable stream, you use the .write() function. When the stream is finished, you use .end().
When you call .end(), it does NOT end the stream immediately. It will use process.nextTick() to end the stream on the next tick! This has caused many race condition heartaches for me.
Writable streams have a buffer. If the buffer is full (highWaterMark), then it will return false when you call .write(). However, you can keep writing to it and ignore this event if you want. Otherwise, I think there's something like a 'drain' event that notifies you that you can continue writing.
Writable streams implement a _write() method to send the data to some back-end non-stream sink. If this method returns false, then the Writable stream will start buffering data and not call _write() again until 'drain'.

Using Readable and Writable Streams together

You can pipe one readable stream to ONE writable stream ONLY. This might confuse you, since you might have seen syntax like 'streamA.pipe(streamB).pipe(streamC)'... etc. The fact of the matter is, the only Readable stream in this example is streamA. The only Writable stream is streamC. streamB (and any other streams in between) is a special kind of stream called a Transform stream.
Key Point 1: You cannot pipe to a readable stream. Everything must start at a Readable stream.
Key Point 2: You cannot pipe a writable stream to anything else. A writable stream is where it ends. The data must exit a writable stream via the _write method().

The only way to have streams piped to each other is to use transform streams. With me so far? Here is where it gets extremely confusing: Duplex, Transform and PassThrough

Duplex

A duplex stream is a readable and writable stream combined. When you pipe a duplex stream (or read from a duplex stream), it operates as a Readable stream. When you pipe to a duplex stream, it operates the exact way a Writable stream does.
Key Point 1: The example 'streamA.pipe(duplexB).pipe(streamC)' means that data is read from Readable streamA's _read() method and sent to duplexB's _write() method. It does NOT go to streamC. It also means that data read from duplexB's _read() method goes to streamC. The syntax is confusing because it looks like the data is going in a line from streamA to streamC.
Key Point 2: It is super confusing when using duplex streams whether to call .push(null) or .end() to end the stream. It's also super confusing whether you should listen to the 'end' or 'finish' event. I still don't have an answer to this. Does calling end() implicitly do a .push(null)?

Both of these key points make using Duplex streams extremely confusing. In fact, I wanted a bi-directional stream that worked exactly as above, so I created my own here. I call it the 'link-stream', and it doesn't actually use the _read or _write methods. It takes data from streamA and pipes it to streamC and vice versa in full duplex mode, and you can listen on the 'finish' or 'end' event, it doesn't matter. It's a true bi-directional passthrough pipe.

Transform

A Transform stream is a Duplex stream
Calling write() on a transform stream calls _write under the covers, which just calls _read()
Calling this.push(...) on a transform stream calls _read under the covers, which calls _transform()
Basically all data paths lead to the _transform() method. You implement the _transform method. No matter how you use the stream, it can act as both a readable or writable, and the data always goes to the same place, the _transform() method
Once the _transform method is called, the data is sent to any writable stream it is piped to.

PassThrough

This is just a Transform stream that does nothing in the _transform method.

So there you have it. I'd really hope that the Joyent folks clean up Duplex and make it less confusing, and I really hope they add a bi-directional PassThrough, so I don't have to use my link-stream method I described above.

Good Luck!

162

answered Oct 12 '22 11:10

datasedai

Related questions
                            
                                Get single object from array using JavaScript functions [duplicate]
                            
                                Timer countdown Angular 2
                            
                                Babel - regeneratorRuntime is not defined, when using transform-async-to-generator plugin
                            
                                Cypress: any difference between cy.get("a").find("b") and cy.get("a b")
                            
                                @Output childEvent not initialized
                            
                                How to get previous url in react gatsby
                            
                                Preview PDF/Image file before upload
                            
                                Auto insert date and time in form input field?
                            
                                javascript: replace linebreak
                            
                                Event listeners for jQuery's UI tabs?
                            
                                Multidimensional boolean array checking if all true in Javascript
                            
                                Jquery - how to explode arrays value
                            
                                Exporting a mongoose database module
                            
                                How to randomly sort list items?
                            
                                Javascript Regex - What to use to validate a phone number?
                            
                                jQuery get last part of URL
                            
                                How can I convert this complicated date format to this in javascript
                            
                                Detecting hover events over parts of a chart using Chart.js
                            
                                In Javascript, how to create an accurate timer with milliseconds?
                            
                                Add HTTP basic authentication to this HTTP GET in angularjs

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With