Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Manipulating the V8 ast

I intend to implement a js code coverage directly in the v8 code. My initial target is to add a simple print for every statement in the abstract syntax tree. I saw that there is an AstVisitor class , which allows you to traverse the AST. so my question is how can i add a statement to the AST after the statement the visitor is currently visiting?

like image 602
user2240085 Avatar asked Apr 03 '13 11:04

user2240085


1 Answers

Ok, I'll summarize my experiments. First, what I write applies to V8 as it was used in Chromium version r157275, so things may not work any more - but I'll nevertheless link to the places in the current version.

As said, you need your own AST visitor, say MyAstVisior, which inherits from AstVisitor and must implement a bunch of VisitXYZ methods from there. The only one needed to instrument/inspect executed code is VisitFunctionLiteral. Executed code is either a function or a set of loose statements in a source (file) which V8 wraps in a function which is then executed.

Then, just before a parsed AST is converted to code, here (compilation of the function made form the loose statements) and there (compilation during runtime, when a predefined function is executed for the first time), you pass your visitor to the function literal, which will call VisitFunctionLiteral on the visitor:

MyAstVisitor myAV(info);
info->function()->Accept(&myAV);
// next line is the V8 compile call
if (!MakeCode(info)) {

I passed the CompilationInfo pointer info to the custom visitor because one needs that to modify the AST. The constructor looks like this:

MyAstVisitor(CompilationInfo* compInfo) :
    _ci(compInfo), _nf(compInfo->isolate(), compInfo->zone()), _z(compInfo->zone()){};

_ci, _nf and _z are pointers to CompilationInfo, AstNodeFactory<AstNullVisitor> and Zone.

Now in VisitFunctionLiteral you can iterate through the function body and also insert statements if you like.

void MyAstVisitor::VisitFunctionLiteral(FunctionLiteral* funLit){
    // fetch the function body
    ZoneList<Statement*>* body = funLit->body();
    // create a statement list used to collect the instrumented statements
    ZoneList<Statement*>* _stmts = new (_z) ZoneList<Statement*>(body->length(), _z);
    // iterate over the function body and rewrite each statement
    for (int i = 0; i < body->length(); i++) {
       // the rewritten statements are put into the collector
       rewriteStatement(body->at(i), _stmts);
    }
    // replace the original function body with the instrumented one
    body->Clear();
    body->AddAll(_stmts->ToVector(), _z);
}

In the rewriteStatement method you now can inspect the statement. The _stmts pointer holds a list of statements which in the end will replace the original function body. So to add a print statement after each statement you first add the original statement and then add your own print statement:

void MyAstVisitor::rewriteStatement(Statement* stmt, ZoneList<Statement*>* collector){
    // add original statement
    collector->Add(stmt, _z);

    // create and add print statement, assuming you define print somewhere in JS:

    // 1) create handle (VariableProxy) for print function
    Vector<const char> fName("print", 5);
    Handle<String> fNameStr = Isolate::Current()->factory()->NewStringFromAscii(fName, TENURED);
    fNameStr = Isolate::Current()->factory()->SymbolFromString(fNameStr);
    // create the proxy - (it is vital to use _ci->function()->scope(), _ci->scope() crashes)
    VariableProxy* _printVP = _ci->function()->scope()->NewUnresolved(&_nf, fNameStr, Interface::NewUnknown(_z), 0);

    // 2) create message
    Vector<const char> tmp("Hello World!", 12);
    Handle<String> v8String = Isolate::Current()->factory()->NewStringFromAscii(tmp, TENURED);
    Literal* msg = _nf.NewLiteral(v8String);

    // 3) create argument list, call expression, expression statement and add the latter to the collector
    ZoneList<Expression*>* args = new (_z) ZoneList<Expression*>(1, _z);
    args->Add(msg);
    Call* printCall = _nf.NewCall(_printVP, args, 0);
    ExpressionStatement* printStmt = _nf.NewExpressionStatement(printCall);
    collector->Add(printStmt, _z);   
}

The last parameter of NewCall and NewUnresolved is a number specifying the position in the script. I assume this is used for debug/error messages to tell where an error happened. I at least never encountered problems with setting it to 0 (there is also a constant somewhere kNoPosition).

Some final words: This will not actually add a print statement after each statement, because Blocks (e.g. loop bodies) are statements that represent a list of statements and loops are statements that have a condition expression and a body block. So you would need to inspect what kind of statement currently is handled and recursively look into it. Rewriting blocks is pretty much the same as rewriting a function body.

But you will run into problems when you start to replace or modify existing statements, because the AST also carries information about branching. So if you replace a jump target for some condition you break your code. I guess this could be covered if one directly adds rewriting capabilities to the single expression and statement types instead of creating new ones to replace them.

So far, I hope it helps.

like image 103
Jonas Avatar answered Nov 08 '22 15:11

Jonas