Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Understanding Node JS Generators with fs Module

I've been very excited about Node JS for awhile. I finally decided to knuckle down and write a test project to learn about generators in the latest Harmony build of Node.

Here is my very simple test project:

https://github.com/kirkouimet/project-node

To run my test project, you can easily pull the files from Github and then run it with:

node --harmony App.js

Here's my problem - I can't seem to get Node's asynchronous fs.readdir method to run inline with generators. Other projects out there, such as Galaxy and suspend seem to be able to do it.

Here is the block of code I need to fix. I want to be able to instantiate an object of type FileSystem and call the .list() method on it:

https://github.com/kirkouimet/project-node/blob/4c77294f42da9e078775bb84c763d4c60f21e1cc/FileSystem.js#L7-L11

FileSystem = Class.extend({

    construct: function() {
        this.currentDirectory = null;
    },

    list: function*(path) {
        var list = yield NodeFileSystem.readdir(path);

        return list;
    }

});

Do I need to do something ahead of time to convert Node's fs.readdir into a generator?

One important note, I am parsing all class functions as they are created. This lets me handle generator functions differently than normal functions:

https://github.com/kirkouimet/project-node/blob/4c77294f42da9e078775bb84c763d4c60f21e1cc/Class.js#L31-L51

I've been really stumped with this project. Would love any assistance!

Here is what I am trying to accomplish:

  1. Heavy use of classes with a modified version of John Resig's JavaScript Class support with inheritance
  2. Using generators to get inline support for Node's stock async calls

Edit

I've tried to implement your example function and I am running into some trouble.

list: function*(path) {
    var list = null;

    var whatDoesCoReturn = co(function*() {
        list = yield readdir(path);
        console.log(list); // This shows an array of files (good!)
        return list; // Just my guess that co should get this back, it doesn't
    })();
    console.log(whatDoesCoReturn); // This returns undefined (sad times)

    // I need to use `list` right here

    return list; // This returns as null
}
like image 240
Kirk Ouimet Avatar asked Mar 19 '14 00:03

Kirk Ouimet


People also ask

What is fs module in node JS?

The Node.js file system module allows you to work with the file system on your computer. To include the File System module, use the require() method: var fs = require('fs'); Common use for the File System module: Read files.

What are generators in Nodejs?

Generators are function executions that can be suspended and resumed at a later point. Generators are useful when carrying out concepts such as 'lazy execution'. This basically means that by suspending execution and resuming at will, we are able to pull values only when we need to.

Which method of fs module is used to read a file in node JS?

The read() method of fs package reads the file using a file descriptor. In order to read files without file descriptor the readFile() method of fs package can be used.


2 Answers

First and foremost, it is important to have a good model in your head of exactly what a generator is. A generator function is a function that returns a generator object, and that generator object will step through yield statements within the generator function as you call .next() on it.

Given that description, you should notice that asynchronous behavior is not mentioned. Any action on a generator on its own is synchronous. You can run to the first yield immediately and then do a setTimeout and then call .next() to go to the next yield, but it is the setTimeout that causes asynchronous behavior, not the generator itself.

So let's cast this in the light of fs.readdir. fs.readdir is an async function, and using it in a generator on its own will have no effect. Let's look at your example:

function * read(path){
    return yield fs.readdir(path);
}

var gen = read(path);
// gen is now a generator object.

var first = gen.next();
// This is equivalent to first = fs.readdir(path);
// Which means first === undefined since fs.readdir returns nothing.

var final = gen.next();
// This is equivalent to final = undefined;
// Because you are returning the result of 'yield', and that is the value passed
// into .next(), and you are not passing anything to it.

Hopefully it makes it clearer that what you are still calling readdir synchronously, and you are not passing any callback, so it will probably throw an error or something.

So how do you get nice behavior from generators?

Generally this is accomplished by having the generator yield a special object that represents the result of readdir before the value has actually been calculated.

For (unrealistic) example, yielding a function is a simple way to yield something that represents the value.

function * read(path){
    return yield function(callback){
        fs.readdir(path, callback);
    };
}

var gen = read(path);
// gen is now a generator object.

var first = gen.next();
// This is equivalent to first = function(callback){ ... };

// Trigger the callback to calculate the value here.
first(function(err, dir){
  var dirData = gen.next(dir);
  // This will just return 'dir' since we are directly returning the yielded value.

  // Do whatever.
});

Really, you would want this type of logic to continue calling the generator until all of the yield calls are done, rather than hard-coding each call. The main thing to notice with this though, is now the generator itself looks synchronous, and everything outside the read function is super generic.

You need some kind of generator wrapper function that handles this yield value process, and your example of the suspend does exactly this. Another example is co.

The standard method for the method of "return something representing the value" is to return a promise or a thunk since returning a function like I did is kind of ugly.

With the thunk and co libraries, you with do the above without the example function:

var thunkify = require('thunkify');
var co = require('co');
var fs = require('fs');
var readdir = thunkify(fs.readdir);

co(function * (){
    // `readdir` will call the node function, and return a thunk representing the
    // directory, which is then `yield`ed to `co`, which will wait for the data
    // to be ready, and then it will start the generator again, passing the value
    // as the result of the `yield`.
    var dirData = yield readdir(path, callback);

    // Do whatever.
})(function(err, result){
    // This callback is called once the synchronous-looking generator has returned.
    // or thrown an exception.
});

Update

Your update still has some confusion. If you want your list function to be a generator, then you will need to use co outside of list wherever you are calling it. Everything inside of co should be generator-based and everything outside co should be callback-based. co does not make list automatically asynchronous. co is used to translate a generator-based async flow control into callback-based flow control.

e.g.

list: function(path, callback){
    co(function * (){
      var list = yield readdir(path);

      // Use `list` right here.

      return list;
    })(function(err, result){
      // err here would be set if your 'readdir' call had an error
      // result is the return value from 'co', so it would be 'list'.

      callback(err, result);
    })
}
like image 69
loganfsmyth Avatar answered Nov 04 '22 08:11

loganfsmyth


@loganfsmyth already provides a great answer to your question. The goal of my answer is to help you understand how JavaScript generators actually work, as this is a very important step to using them correctly.

Generators implement a state machine, the concept which is nothing new by itself. What's new is that generators allow to use the familiar JavaScript language construct (e.g., for, if, try/catch) to implement a state machine without giving up the linear code flow.

The original goal for generators is to generate a sequence of data, which has nothing to do with asynchrony. Example:

// with generator

function* sequence()
{
    var i = 0;
    while (i < 10)
        yield ++i * 2;
}

for (var j of sequence())
    console.log(j);

// without generator

function bulkySequence()
{
    var i = 0;
    var nextStep = function() {
        if ( i >= 10 )
            return { value: undefined, done: true };
        return { value: ++i * 2, done: false };
    }
    return { next: nextStep };
}

for (var j of bulkySequence())
    console.log(j);

The second part (bulkySequence) shows how to implement the same state machine in the traditional way, without generators. In this case, we no longer able to use while loop to generate values, and the continuation happens via nextStep callback. This code is bulky and unreadable.

Let's introduce asynchrony. In this case, the continuation to the next step of the state machine will be driven not by for of loop, but by some external event. I'll use a timer interval as a source of the event, but it may as well be a Node.js operation completion callback, or a promise resolution callback.

The idea is to show how it works without using any external libraries (like Q, Bluebird, Co etc). Nothing stops the generator from self-driving itself to the next step, and that's what the following code does. Once all steps of the asynchronous logic have completed (the 10 timer ticks), doneCallback will be invoked. Note, I don't return any meaningful data with yield here. I merely use it to suspend and resume the execution:

function workAsync(doneCallback)
{
    var worker = (function* () {
        // the timer callback drivers to the next step
        var interval = setInterval(function() { 
            worker.next(); }, 500);

        try {
            var tick = 0;
            while (tick < 10 ) {
                // resume upon next tick
                yield null;
                console.log("tick: " + tick++);
            }
            doneCallback(null, null);
        }
        catch (ex) {
            doneCallback(ex, null);
        }
        finally {
            clearInterval(interval);
        }
    })();

    // initial step
    worker.next();
}

workAsync(function(err, result) { 
    console.log("Done, any errror: " + err); });

Finally, let's create a sequence of events:

function workAsync(doneCallback)
{
    var worker = (function* () {
        // the timer callback drivers to the next step
        setTimeout(function() { 
            worker.next(); }, 1000);

        yield null;
        console.log("timer1 fired.");

        setTimeout(function() { 
            worker.next(); }, 2000);

        yield null;
        console.log("timer2 fired.");

        setTimeout(function() { 
            worker.next(); }, 3000);

        yield null;
        console.log("timer3 fired.");

        doneCallback(null, null);
    })();

    // initial step
    worker.next();
}

workAsync(function(err, result) { 
    console.log("Done, any errror: " + err); });

Once you understand this concept, you can move on with using promises as wrappers for generators, which takes it to the next powerful level.

like image 42
noseratio Avatar answered Nov 04 '22 08:11

noseratio