Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Nodejs - streaming readable & writable misunderstood

I run a node server and have the following code:

var readable = fs.createReadStream(__dirname + '/greet.txt',
{encoding: 'utf8', highWaterMark: 332 * 1024});

greet.txt:

hello

I am having trouble understanding readable stream and writeable stream; In my code above, I have a readable stream which reads from greet.txt - chunks enter the buffer and I can see the binary data... the issue is, shouldn't there be a writable stream which sends data to my buffer on the other side? how binary data start flying all of the sudden into my buffer, It's just not clear.

Here is a combination of readable and writable:

var readable = fs.createReadStream(__dirname + '/greet.txt',
{encoding: 'utf8', highWaterMark: 332 * 1024});

var writeable = fs.createWriteStream(__dirname + '/greetcopy.txt');

readable.on('data', function(chunk){
writeable.write(chunk);
});

as a chunk arrives to readable buffer, and being sent to writable stream's buffer through an event, shouldn't the writable stream be readable too in order to receive the data? and once the writable stream's buffer gets the info from the readable and sends it to greetcopy.txt file(which is empty), how does the data arrive?

the concept of readable and writable in node are over-simplified and I just have hard time grasping them. Thank you for your time, I'd like some info on what's going on behind the scenes...

like image 594
RunningFromShia Avatar asked Feb 16 '16 16:02

RunningFromShia


People also ask

What is a readable stream NodeJs?

There are four fundamental stream types in Node. js: Readable, Writable, Duplex, and Transform streams. A readable stream is an abstraction for a source from which data can be consumed. An example of that is the fs. createReadStream method.

Is NodeJs good for video streaming?

js is a runtime used for building fast and scalable applications. We will use it to handle fetching and streaming videos, generating thumbnails for videos, and serving captions and subtitles for videos. Nuxt.

How do I get data from a readable stream?

read() Method. The readable. read() method is an inbuilt application programming interface of Stream module which is used to read the data out of the internal buffer. It returns data as a buffer object if no encoding is being specified or if the stream is working in object mode.

How do I stream a NodeJs file?

In this section, you will write input from the terminal to a file using createWriteStream() . The createWriteStream function returns a writable file stream that you can write data to. Like the readable stream in the previous step, this writable stream emits a set of events like error , finish , and pipe .


1 Answers

Node.js streams are extremely convoluted and confusing. I spent a large amount of time trying to understand them, and I'll try to convey my findings below.

There are 5 types, Readable, Writable, Duplex, Transform and PassThrough.

Ok, the easy part first: Readable and Writable

Readable

  • To add data to a readable stream, you use the .push() function. When the stream is finished, you push(null).
  • When ended, readable streams fire the 'end' event.
  • You can read data from a readable stream by listening for the 'readable' event and then executing 'read()' until it returns null.
  • Readable streams have a buffer, meaning that when you 'push()' to a buffer, if the buffer is full, then push() will return false. However, you can continue pushing to the buffer and filling it even though it's full. The 'highWaterMark' (or the buffer size) is really informational.
  • Readable streams implement a _read() method to pull data from a non-stream source. You don't have to use this, however. You can just leave this method blank and use the push method described earlier. Whoever is using your stream could call read(), which would first read from the internal buffer, and then go call _read() when the buffer is empty.

Writable

  • To add data to a writable stream, you use the .write() function. When the stream is finished, you use .end().
  • When you call .end(), it does NOT end the stream immediately. It will use process.nextTick() to end the stream on the next tick! This has caused many race condition heartaches for me.
  • Writable streams have a buffer. If the buffer is full (highWaterMark), then it will return false when you call .write(). However, you can keep writing to it and ignore this event if you want. Otherwise, I think there's something like a 'drain' event that notifies you that you can continue writing.
  • Writable streams implement a _write() method to send the data to some back-end non-stream sink. If this method returns false, then the Writable stream will start buffering data and not call _write() again until 'drain'.

Using Readable and Writable Streams together

  • You can pipe one readable stream to ONE writable stream ONLY. This might confuse you, since you might have seen syntax like 'streamA.pipe(streamB).pipe(streamC)'... etc. The fact of the matter is, the only Readable stream in this example is streamA. The only Writable stream is streamC. streamB (and any other streams in between) is a special kind of stream called a Transform stream.
  • Key Point 1: You cannot pipe to a readable stream. Everything must start at a Readable stream.
  • Key Point 2: You cannot pipe a writable stream to anything else. A writable stream is where it ends. The data must exit a writable stream via the _write method().

The only way to have streams piped to each other is to use transform streams. With me so far? Here is where it gets extremely confusing: Duplex, Transform and PassThrough

Duplex

  • A duplex stream is a readable and writable stream combined. When you pipe a duplex stream (or read from a duplex stream), it operates as a Readable stream. When you pipe to a duplex stream, it operates the exact way a Writable stream does.
  • Key Point 1: The example 'streamA.pipe(duplexB).pipe(streamC)' means that data is read from Readable streamA's _read() method and sent to duplexB's _write() method. It does NOT go to streamC. It also means that data read from duplexB's _read() method goes to streamC. The syntax is confusing because it looks like the data is going in a line from streamA to streamC.
  • Key Point 2: It is super confusing when using duplex streams whether to call .push(null) or .end() to end the stream. It's also super confusing whether you should listen to the 'end' or 'finish' event. I still don't have an answer to this. Does calling end() implicitly do a .push(null)?

Both of these key points make using Duplex streams extremely confusing. In fact, I wanted a bi-directional stream that worked exactly as above, so I created my own here. I call it the 'link-stream', and it doesn't actually use the _read or _write methods. It takes data from streamA and pipes it to streamC and vice versa in full duplex mode, and you can listen on the 'finish' or 'end' event, it doesn't matter. It's a true bi-directional passthrough pipe.

Transform

  • A Transform stream is a Duplex stream
  • Calling write() on a transform stream calls _write under the covers, which just calls _read()
  • Calling this.push(...) on a transform stream calls _read under the covers, which calls _transform()
  • Basically all data paths lead to the _transform() method. You implement the _transform method. No matter how you use the stream, it can act as both a readable or writable, and the data always goes to the same place, the _transform() method
  • Once the _transform method is called, the data is sent to any writable stream it is piped to.

PassThrough

  • This is just a Transform stream that does nothing in the _transform method.

So there you have it. I'd really hope that the Joyent folks clean up Duplex and make it less confusing, and I really hope they add a bi-directional PassThrough, so I don't have to use my link-stream method I described above.

Good Luck!

like image 162
datasedai Avatar answered Oct 12 '22 11:10

datasedai