Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Programmatically rename functions

I am currently writing an ECMAScipt5 compiler that performs various given optimizations/transformations on a parse tree and compiles back to ECMAScipt5.

One functionality is to rename a Binding in an EnvironmentRecord.

This transformation may either be performed automatically e.g. as part of an optimization that aims to reduce code size, where each variable (not in the global scope) will be given the next shortest available name, or manually by an annotation after a statement that introduces a new scope.

However, I have to restrict the (automatic) process to variable declarations only.

Consider those two examples. The first, compiled, specifying [Minify] as transformations, the second one using [Directives, PrettyPrint]

Syntax: Compiler.fromSource (src).compile ([/*Array of transformations*/]);

var bar,foo;
(function exampleMin () {
    var bar="foo",
        foo="bar";
    
    function fooBar () {
        return foo + bar;
    }
})

compiles to

var bar,foo;function exampleMin(){var A="foo",B="bar";function fooBar(){return B+A}}

And

var bar,foo;
(function exampleMin () {
    @Rename bar:A
    @Rename foo:B
    @Rename fooBar:C
    var bar="foo",
        foo="bar";
    
    function fooBar () {
        return foo + bar;
    }
})

compiles to

var bar,foo;
function exampleMin(){
     var A="foo",B="bar";
     function C(){
          return B+A;
     }
};

Which leads to the problematic part, functions... consider the following

if (fooBar.name === 'fooBar') {
 //...
}

Now, if this statement would be contained in exampleMin. The user defined rename would have transformed the code into a semantically different code. Which must not ever happen by an automatic performed transformation.

While I blindly assume that user defined renaming of functions doesn't change the semantics somehow, I would like to produce a warning if that may be the case. But I don't know how to determine whether it's safe to rename a function programmatically or not.

This brings me down to the questions:

  1. What, besides accessing a functions name has to be considered when renaming a function?
  2. What analysis has to be performed to mark a function as either safely optimizable or not. Is it possible at all.
  3. Would I rather exclude functions from being renamed or would I try to change the other side of e.g. a comparison against a functions name too. (If it can be proved to have no side effects either)
  4. Would a change in the semantics be tolerable in such a specific case (GCC seems to think so), if I, in exchange, offer a @do-not-optimize annotation?

Update 1.

I have come to the conclusion that this analysis might be not possible solely through static analysis

Consider the following code.

function foo () {}
function bar () {}
var fns = [bar,foo];

if (fns [0].name === 'bar') fns [0] ();

fns.unshift (foo);

if (fns [1].name === 'bar') fns [1] ();

I can't imagine how to track the references back to it's origin once a function has been added to an array, without executing the code. Maybe i would need some form of Abstract Interpretation1?


Update2

In the mean-time and after reading @Genes answer, I realized there are other few things that may not hurt to be added. First, some side notes:

  • Apparently I am not writing a compiler, but rather a preprocessor since it outputs sourcecode and not machine code.
  • Given that there would only be static accesses of bound Identifiers, I have a good idea on how to approach the problem.
  • Each Binding in every environment record currently holds a list to all its static references (I obviously couldn't add dynamic ones)

I am currently working on the SSA[2] conversion. So I haven't yet implemented any DataFlow analysis yet. But that's on the plan.

So for the sake of simplicity, let's just assume the following prerequisites would be met.

  • AST and CFG are in Static Single Assignment form.

  • GEN and KILL sets have been computed for each node in the CFG4

  • Reaching Definitions4 / IN and OUT sets have been computed.

  • DEF / USE pairs have been computed

  • flow dependence edges have been added to the CFG

    So the Control Flow Graph(s) for the first example could look something like this.

enter image description here

  • The black, non-dotted lines represent control flow edges.
  • The black dotted connections represent dataflow dependencies
  • The blue double arrow lines represent call sites.
  • The blue dotted lines represent interprocedural dependencies. I'm however not sure if i should make a direct connection between the corresponding nodes of each precedures CFG

Given this, I could know simply perform the following.

For each function that is about to be renamed:

  • Visit its declarations CFG node
  • For each flow dependency edge visit the target node
  • If that node is a conditional goto statement and the functions reference is the LHS of a property accessor with the RHS being "name".
    • Mark the function as tainted

The only problem is I can't see how to compute (even approximate) that information for non-static references of a function.

Soo, if that analysis doesn't help finding ALL references to a function, i could as well use the beforementioned list of references, that each Binding in an environment record holds. Since a function has a declarative environment record as well as an object environment record. I could simply take a look at the count of references of its object environments "name" Binding.

As a reference, here is the actual code that currently performs the renaming

var boundIdentifiers = this.environment.record.bindings, //`this` refers to an AST node representing a FunctionDeclaration or a FunctionExpression
    nextName, 
    identifier, 
    binding;

for (identifier in boundIdentifiers) {
    binding = boundIdentifiers [identifier];
    if (binding.uses < 2 && !binding.FunctionExpression) {
        compiler.pushWarning (binding.references [0].line, binding.references [0].column,'Declared function ' + identifier + ' is never called.') //False positive if the functions reference is obtained dynamically
    }

    if (boundIdentifiers [identifier].FunctionDeclaration || boundIdentifiers [identifier].FunctionExpression) {
        continue; //Skip function declarations and expressions, since their name property could be accessed
    }

    do {
        nextName = nextVar (); 
    } while (
        Object.hasOwnProperty.call (boundIdentifiers,nextVar) //There could exist a `hasOwnProperty` binding.
    ); //ther could a with the name that already exists in the scope. So make sure we have assign a free name.

    this.environment.record.setBindingName (identifier, nextName);
}

So the overall problem boils down to catching non-static references

What analysis techniques and prior optimizations would need to be involved to catch at least some (since its not possible to catch em all), non-static references.


I modified the questions to fit the update. So, the above questions still apply

[1]A Survey of Static Program Analysis Techniques (CH: 2) [2]Static Single Assignment Book [4]Representation and Analysis of Software


As @didierc mentioned in the comments, the same problem arises for property accesses using the bracket notation. So Bindings of an object environment record can be renamed manually only.

like image 340
Moritz Roessler Avatar asked Jul 20 '14 17:07

Moritz Roessler


2 Answers

I think you need to break the .name property to return the original name instead of the new name. Nothing else will work.

Consider replacing all .name with ._name(), and constructing a lookup table of ref->name.

like image 72
Phil H Avatar answered Oct 09 '22 18:10

Phil H


The problem is not that renaming a function is unsafe, the problem is that code which depends on "name" property of a function is unsafe, regardless of the renaming you are considering.

Consider that a function object's 'name' property isn't defined in the ecma standard, and that some browsers therefor don't implement it. In JavaScript functions can be nameless, or may have several names, so taking a dependency on a browser-specific name property in code is the problem, not the concept of renaming functions.

like image 32
Burt_Harris Avatar answered Oct 09 '22 19:10

Burt_Harris