Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I detect all dependencies of a function in Node.js?

I'm trying to give a broad picture of my problem. I need to write a program with Node.js that should be able to detect all dependencies a function.

E.g.

function a() {
   //do something
   b();
};

function b() {
  console.log("Hey, This is b");
};

At the example above I need to have an JSON like this:

{
    "a": {
        dependencies: ["b"],
        range: [1, 4]
    },
    "b": {
        dependencies: [],
        range: [5, 8]
    }
}

In the dependencies property I need to have an array of functions that called inside the function, and by range I mean the line range of function definition.

I need a solution to achieve this goal. Are there any tools or plugins for Node.js?

like image 347
Afshin Mehrabani Avatar asked Jul 28 '13 05:07

Afshin Mehrabani


People also ask

How do I manage dependencies in node JS?

Option 2: Specify a new dependency in package. Another way to install a new package is to specify it as a dependency in package. json , then run npm install with no arguments. The new dependency and all of its dependencies will be installed.

What are Nodejs dependencies?

The dependencies value is used to specify any other modules that a given module (represented by the package. json ) requires to work. When you run npm install from the root folder of a given module, it will install any modules listed in that dependencies object.

How can you make sure your dependencies are safe node JS?

The only option is to automate the update / security audit of your dependencies. For that there are free and paid options: npm outdated.


1 Answers

(I apologise in advance: I usually try and make my answers humorous to ease the reader through them, but I couldn't successfully do so in this case. Consider that a double apology for the length of this answer.)

0. TL;DR (for "normal people") of the problem

This is not an easy problem. Instead of solving it in full, we will limit its scope - we will only solve the portion of the problem we care about. We will do so by parsing the input with a JavaScript parser and going over it with a simple recurive-descent algorithm. Our algorithm will analyse the program's scope and correctly identify function calls.

All the rest is just filling in the blanks! The result is at the bottom of the answer, so I recommend you grab to the first comment if you don't want to read through.

1. Limiting the problem

As Benjamin Gruenbaum's answer says, this is a very, very hard problem because of JavaScript's dynamic nature. However, what if instead of making a solution which'll work for 100% of programs, we instead do it for a subset of programs, if we limit ourselves to handle certain things?

The most important limitation:

  • No eval. If we include eval, it's a spiral into chaos. This is because eval let's you use arbitrary strings which makes tracking dependencies impossible without checking every possible input. In NodeJS there are no document.writes and setTimeout only accepts a function so we don't have to worry about those. However, we also disallow the vm module.

The following limitations are to ease the process. They may be solvable, but solving them is out of scope for this answer:

  1. No dynamic keys obj[key]() it saddens me to introduce this limitation, but it's definitely solvable for some cases (e.g. key = 'foo' but not key = userInput())
  2. Variables are not shadows, no var self = this. Definitely solvable with a complete scope parser.
  3. No funky expressions, e.g. (a, b)()

And finally, limitations to the implementation in this answer - either because of complexity constraints or time constraints (but they are very solvable):

  1. No hoisting, so function declarations won't bob up in the scope.
  2. No object handling. This sucks, but handling things like foo.bar() or this.foo() would've at least doubled the program complexity. Put in enough time, and it's very doable.
  3. Only function scope is honored. There're ways in JavaScript to define scopes other than functions (the with statement, catch blocks). We don't deal with them.

In this answer, I'll outline (and provide) a proof-of-concept parser.

2. Approaching the problem

Given a program, how can we decipher its function dependencies?

//A. just a global function
globalFunction();
//B. a function within a function
var outer = function () {
    function foo () {}
    foo();
};
//C. calling a function within itself
var outer = function inner () {
    inner();
};
//D. disambiguating between two identically named functions
function foo () {
    var foo = function () {};
    foo();
}
foo();

In order to understand a program, we need to break its code apart, we need to understand its semantics: we need a parser. I've chosen acorn because I've never used it and heard good praise. I suggest you play with it a bit, see what programs look like in SpiderMonkeys's AST.

Now that we have a magical parser which transforms JavaScript into an AST (an Abstract Syntax Tree), how will we logically handle finding dependencies? We'll need do two things:

  1. Build scopes correctly
  2. Understand which function a function call refers to.

We can see why example D above can be ambiguous: There are two functions called foo, how can we know which one foo() means? That's why we need to implement scoping.

3. Solving the problem

Since the solution is in two parts, let's solve it that way. Beginning from the biggest problem:

3.1. Scoping

So...we have an AST. It has a bunch of nodes. How do we build a scope? Well, we only care about function scope. That eases the process, as we know we only have to deal with functions. But before we talk about how to use scopes, let's define the function which makes scopes.

What does a scope have? It's not a complex being: It has a parent scope (or null if it's the global scope), and it has the items it contains. We need a way to add things to a scope, and get things from one. Let's do that:

var Scope = function (parent) {
    var ret = { items : {}, parent : parent, children : [] };

    ret.get = function (name) {
        if (this.items[name]) {
            return this.items[name];
        }

        if (this.parent) {
            return this.parent.get(name);
        }

        //this is fake, as it also assumes every global reference is legit
        return name;
    };

    ret.add = function (name, val) {
        this.items[name] = val;
    };

    if (parent) {
        parent.children.push(ret);
    }
    return ret;
};

As you may have noticed, I'm cheating in two aspects: First, I'm assigning child scopes. That is to make it easier for us measly humans to see that things are working (otherwise, all scoping would be internal, we'd only see the global scope). Second, I'm assuming the global scope contains all - that is, if foo isn't defined in any scope, then it must be an existing global variable. That may or may not be desirable.

OK, we have a way to represent scopes. Don't crack open the champagne yet, we still have to actually make them! Let's see how a simple function declaration, function f(){} looks like in AST:

{
  "type": "Program",
  "start": 0,
  "end": 14,
  "body": [{
    "type": "FunctionDeclaration",
    "start": 0,
    "end": 14,
    "id": {
      "type": "Identifier",
      "start": 9,
      "end": 10,
      "name": "f"
    },
    "params": [],
    "body": {
      "type": "BlockStatement",
      "start": 12,
      "end": 14,
      "body": []
    }
  }]
}

That's quite a mouthful, but we can brave through it! The juicy part is this:

{
  "type": "FunctionDeclaration",
  "id": {
    "type": "Identifier",
    "name": "f"
  },
  "params": [ ... ],
  "body": { ... }
}

We have a FunctionDeclaration node with an id property. That id's name is our function's name! Let's assume we have a function walk which takes care of walking over nodes, and currentScope and currentFuncName variables, and we've just arrived at parsing our function declaration node. How do we do it? Code speaks louder than words:

//save our state, so we will return to it after we handled the function
var cachedScope = currentScope,
    cachedName = currentFuncName;

//and now we change the state
currentScope = Scope(cachedScope);
currentFuncName = node.id.name;

//create the bindings in the parent and current scopes
//the following lines have a serious bug, we'll get to it later (remember that
// we have to meet Captain Crunchypants)
cachedScope.add(currentFuncName, currentName);
currentScope.add(currentFuncName, currentName);

//continue with the parsing
walk(node.body);

//and restore the state
currentScope = cachedScope;
currentFuncName = cachedName;

But wait, what about function expressions? They behave a bit differently! First and foremost, they don't necessarily have a name, and if they do, it's only visible inside them:

var outer = function inner () {
    //outer doesn't exist, inner is visible
};
//outer is visible, inner doesn't exist

Let's make another huge assumption that we've dealt with the variable declaration part - we created the proper binding at the parent scope. Then, the logic above for handling functions changes only slightly:

...
//and now we change the state
currentScope = Scope(cachedScope);
//we  signify anonymous functions with <anon>, since a function can never be called that
currentFuncName = node.id ? node.id.name : '<anon>';
...
if (node.id) {
    currentScope.add(currentFuncName, currentFuncName);
}
if (node.type === 'FunctionDeclaration') {
    cachedScope.add(currentFuncName, currentFuncName);
}
...

And believe it or not, that's more or less the entire scope handling mechanism in the final solution. I expect as you add things like objects it'll get more a bit more complicated, but it not by much.

It's time to meet Captain Crunchpants. The very observant listener will by now have remembered example D. Let's freshen our memory:

function foo () {
    function foo () {}
    foo();
}
foo();

In parsing that, we need a way to tell the outer foo and the inner foo apart - otherwise, we won't be able to know which of these foo calls, and our dependency finder will be toast. Furthermore, we won't be able to tell them apart in the dependency management - if we just add to the results by function name, we'll get overwriting. In other words, we need an absolute function name.

I chose to represent nesting with separation with a # character. The above, then, has a function foo, with an inner function foo#foo, with a call to foo#foo and a call to foo. Or, for a less confusing example:

var outer = function () {
    function inner () {}
    inner();
};
outer();

Has a function outer and a function outer#inner. There's a call to outer#inner and a call to outer.

So, let's create this function which takes the previous name, and the current function's name, and mushes them together:

function nameToAbsolute (parent, child) {
    //foo + bar => foo#bar
    if (parent) {
        return parent + '#' + name;
    }
    return name;
}

And modify our function handling pseudo-code (which is about to come to life! I promise!):

...
currentScope = Scope(cachedScope);
var name = node.id ? node.id.name : '<anon>';
currentFuncName = nameToAbsolute(cachedName, name);
...
if (node.id) {
    currentScope.add(name, currentFuncName);
}
if (node.type === 'FunctionDeclaration') {
    cachedScope.add(name, currentFuncName);
}

Now we're talking! It's time to move on to actually doing something! Maybe I've lied to you all along and I know nothing, maybe I failed miserably and I continued writing until now because I knew nobody will read this far and I'll get many upvotes because it's a long answer!?

HAH! Keep dreming! There's much more to come! I didn't sit on this for a few days for no reason! (As an interesting social experiment, could anyone upvoting comment, saying something around the lines "Captain Crunchpants was happy to see you"?)

On a more serious note, we should begin making the parser: What holds our state and walks over nodes. Since we'll have two parsers at the end, scope and dependency, we'll make a "master parser" which calls each one when needed:

var parser = {
    results : {},
    state : {},

    parse : function (string) {
        this.freshen();

        var root = acorn.parse(string);
        this.walk(root);

        return this.results;
    },

    freshen : function () {
        this.results = {};
        this.results.deps = {};

        this.state = {};
        this.state.scope = this.results.scope = Scope(null);
        this.state.name = '';
    },

    walk : function (node) {
        //insert logic here
    },

    // ''    =>  'foo'
    // 'bar' =>  'bar#foo'
    nameToAbsolute : function (parent, name) {
        return parent ? parent + '#' + name : name;
    },

    cacheState : function () {
        var subject = this.state;
        return Object.keys( subject ).reduce(reduce, {});

        function reduce (ret, key) {
            ret[key] = subject[key];
            return ret;
        }
    },
    restoreState : function (st) {
        var subject = this.state;

        Object.keys(st).forEach(function (key) {
            subject[key] = st[key];
        });
    }
};

That's a bit of cruft, but hopefully it's understandable. We made state into an object, and to make it flexible, cacheState and restoreState are simply cloning/merging.

Now, for our beloved scopeParser:

var scopeParser = {
    parseFunction : function (func) {
        var startState = parser.cacheState(),

            state = parser.state,
            name = node.id ? node.id.name : '<anon>';

        state.scope = Scope(startState.scope);
        state.name = parser.nameToAbsolute(startState.name, name);

        if (func.id) {
            state.scope.add(name, state.name);
        }
        if (func.type === 'FunctionDeclaration') {
            startState.scope.add(name, state.name);
        }

        this.addParamsToScope(func);
        parser.walk(func.body);

        parser.restoreState(startState);
    }
};

The casually observant reader will notice that parser.walk is empty. Time to fill 'er up!

walk : function (node) {
    var type = node.type;

    //yes, this is tight coupling. I will not apologise.
    if (type === 'FunctionDeclaration' || type === 'FunctionExpression') {
        scopeParser.parseFunction(node)
    }
    else if (node.type === 'ExpressionStatement') {
        this.walk(node.expression);
    }
    //Program, BlockStatement, ...
    else if (node.body && node.body.length) {
        node.body.forEach(this.walk, this);
    }
    else {
        console.log(node, 'pass through');
    }
    //...I'm sorry
}

Again, mostly technicalities - to understand these, you need to play with acorn. We want to make sure we iterate and walk into nodes correctly. Expressions Nodes like (function foo() {}) has an expression property we walk over, BlockStatement Nodes (e.g. the actual body of a function) and Program Nodes have a body array, etc.

Since we have something resembling logic, let's try:

> parser.parse('function foo() {}').scope
{ items: { foo: 'foo' },
  parent: null,
  children:
   [ { items: [Object],
       parent: [Circular],
       children: [],
       get: [Function],
       add: [Function] } ],
  get: [Function],
  add: [Function] }

Neat! Play around with function declarations and expressions, see that they're nested correctly. We did however forget to include variable declaration:

var foo = function () {};
bar = function () {};

A good (and fun!) exercise is adding them yourself. But don't worry - they'll be included in the final parser;

Who'd believe!? We're done with scopes! D-O-N-E! Let's do a cheer!

Oh oh oh...where did you think you're going!? We only solved part of the problem - we still have to find the dependencies! Or did you forget all about it!? Fine, you can go to the toilet. But it better be #1.

3.2. Dependency

Wow, did you even remember we had section numbers? On an unrelated note, when I typed the last sentence, my keyboard made a sound reminiscent of the first note of the Super Mario Theme Song. Which is now stuck in my head.

Ok! So, we have our scopes, we have our function names, it's time to identify function calls! This will not take long. Doing acorn.parse('foo()') gives:

{
  "type": "Program",
  "body": [{
    "type": "ExpressionStatement",
    "expression": {
      "type": "CallExpression",
      "callee": {
        "type": "Identifier",
        "name": "f"
      },
      "arguments": []
    }
  }]
}

So we're looking for a CallExpression. But before we go all walk over it, let's first review our logic. Given this node, what do we do? How do we add a dependency?

This is not a difficult problem, as we already took care of all the scoping. We add to the dependencies of the containing function (parser.state.name) the scope resolution of callExpression.callee.name. Sounds simple!

var deps = parser.results.deps,
    scope = parser.state.scope,
    context = parser.state.name || '<global>';

if (!deps[context]) {
    deps[context] = [];
}

deps[context].push(scope.get(node.callee.name));

There're, once again, a trick with handling the global context. If the current state is nameless, we assume it's the global context and give it a special name <global>.

Now that we have that, let's build our dependencyParser:

var dependencyParser = {
    parseCall : function (node) {
         ...the code above...
    }
};

Truly beautiful. We still need to modify parser.walk to include CallExpressions:

walk : function (node) {
    ...
    else if (type === 'CallExpression') {
        dependencyParser.parseCall(node);
    }
}

And try it out on example D:

> parser.parse('function foo() { var foo = function () {}; foo(); } foo()').deps
{ foo: [ 'foo#foo' ], '<global>': [ 'foo' ] }

4. Mock the problem

HAHA! IN YOUR FACE, PROBLEM! WOOOOOOOOOOO!

You may commence celebrations. Remove your pants, run around in the city, claim you're the town chicken and burn stray garbage cans (Zirak and Affiliates in no way support arson of any kind or indecent exposure. Any action taken by oh, say, any reader is not to be blamed upon Zirak and/or Affiliates).

But seriously now. We solved a very, very limited subset of the problem, and to solve it for a small percentage of real-case scenarios there are a lot of things which have to be done. This is not a discouragement - quite the opposite! I urge you to try and do this. It's fun! (Zirak and Affiliates are in no way responsible for any mental breakdown as a result from trying to to what was just said)

Presented here is the source code of the parser, sans any NodeJS specific stuff (i.e. requiring acorn or exposing the parser):

var parser = {
    results : {},
    state : {},

    verbose : false,

    parse : function (string) {
        this.freshen();

        var root = acorn.parse(string);
        this.walk(root);

        return this.results;
    },

    freshen : function () {
        this.results = {};
        this.results.deps = {};

        this.state = {};
        this.state.scope = this.results.scope = Scope(null);
        this.state.name = '';
    },

    walk : function (node) {
        var type = node.type;

        //yes, this is tight coupling. I will not apologise.
        if (type === 'FunctionDeclaration' || type === 'FunctionExpression') {
            scopeParser.parseFunction(node)
        }
        else if (type === 'AssignmentExpression') {
            scopeParser.parseBareAssignmentExpression(node);
        }
        else if (type === 'VariableDeclaration') {
            scopeParser.parseVarDeclaration(node);
        }
        else if (type === 'CallExpression') {
            dependencyParser.parseCall(node);
        }
        else if (node.type === 'ExpressionStatement') {
            this.walk(node.expression);
        }
        //Program, BlockStatement, ...
        else if (node.body && node.body.length) {
            node.body.forEach(this.walk, this);
        }
        else if (this.verbose) {
            console.log(node, 'pass through');
        }
        //...I'm sorry
    },

    // ''    =>  'foo'
    // 'bar' =>  'bar#foo'
    nameToAbsolute : function (parent, name) {
        return parent ? parent + '#' + name : name;
    },

    cacheState : function () {
        var subject = this.state;
        return Object.keys( subject ).reduce(reduce, {});

        function reduce (ret, key) {
            ret[key] = subject[key];
            return ret;
        }
    },
    restoreState : function (st) {
        var subject = this.state;

        Object.keys(st).forEach(function (key) {
            subject[key] = st[key];
        });
    }
};

var dependencyParser = {
    //foo()
    //yes. that's all.
    parseCall : function (node) {
        if (parser.verbose) {
            console.log(node, 'parseCall');
        }

        var deps = parser.results.deps,
            scope = parser.state.scope,
            context = parser.state.name || '<global>';

        if (!deps[context]) {
            deps[context] = [];
        }

        deps[context].push(scope.get(node.callee.name));
    }
};

var scopeParser = {
    // We only care about these kinds of tokens:
    // (1) Function declarations
    // function foo () {}
    // (2) Function expressions assigned to variables
    // var foo = function () {};
    // bar = function () {};
    //
    // Do note the following property:
    // var foo = function bar () {
    //     `bar` is visible, `foo` is not
    // };
    // `bar` is not visible, `foo` is

    /*
      function foo () {}
        =>
      {
        "type": 'FunctionDeclaration',
        "id": {
          "type": Identifier,
          "name": 'foo'
        },
        "params": [],
        "body": { ... }
      }

      (function () {})
        =>
      {
        "type": "FunctionExpression",
        "id": null,
        "params": [],
        "body": { ... }
      }
    */
    parseFunction : function (func) {
        if (parser.verbose) {
            console.log(func, 'parseFunction');
        }
        var startState = parser.cacheState(),
            state = parser.state,
            name = this.grabFuncName(func);

        state.scope = Scope(startState.scope);
        state.name = parser.nameToAbsolute(startState.name, name);

        if (func.id) {
            state.scope.add(name, state.name);
        }
        if (func.type === 'FunctionDeclaration') {
            startState.scope.add(name, state.name);
        }

        this.addParamsToScope(func);
        parser.walk(func.body);

        parser.restoreState(startState);
    },

    grabFuncName : function (func) {
        if (func.id) {
            return func.id.name;
        }
        else if (func.type === 'FunctionExpression') {
            return '<anon>';
        }
        else {
            //...this shouldn't happen
            throw new Error(
                'scope.parseFunction encountered an anomalous function: ' +
                    'nameless and is not an expression');
        }
    },

    /*
      [{
        "type": "Identifier",
        "name": "a"
      }, {
        "type": "Identifier",
        "name": "b"
      }, {
        "type": "Identifier",
        "name": "c"
      }]
    */
    addParamsToScope : function (func) {
        var scope = parser.state.scope,
            fullName = parser.state.name;

        func.params.forEach(addParam);

        function addParam (param) {
            var name = param.name;
            scope.add(name, parser.nameToAbsolute(fullName, name));
        }
    },

    parseVarDeclaration : function (tok) {
        if (parser.verbose) {
            console.log(tok, 'parseVarDeclaration');
        }

        tok.declarations.forEach(parseDecl, this);

        function parseDecl (decl) {
            this.parseAssignment(decl.id, decl.init);
        }
    },

    // Lacking a better name, this:
    // foo = function () {}
    // without a `var`, I call a "bare assignment"
    parseBareAssignmentExpression : function (exp) {
        if (parser.verbose) {
            console.log(exp, 'parseBareAssignmentExpression');
        }
        this.parseAssignment(exp.left, exp.right);
    },

    parseAssignment : function (id, value) {
        if (parser.verbose) {
            console.log(id, value, 'parseAssignment');
        }

        if (!value || value.type !== 'FunctionExpression') {
            return;
        }

        var name = id.name,
            val = parser.nameToAbsolute(parser.state.name, name);
        parser.state.scope.add(name, val);

        this.parseFunction(value);
    }
};

var Scope = function (parent) {
    var ret = { items : {}, parent : parent, children : [] };

    ret.get = function (name) {
        if (this.items[name]) {
            return this.items[name];
        }

        if (this.parent) {
            return this.parent.get(name);
        }

        //this is fake, as it also assumes every global reference is legit
        return name;
    };

    ret.add = function (name, val) {
        this.items[name] = val;
    };

    if (parent) {
        parent.children.push(ret);
    }
    return ret;
};

Now if you'll excuse me, I need a long shower.

like image 119
Zirak Avatar answered Oct 05 '22 09:10

Zirak