I intend to implement a js code coverage directly in the v8 code.
My initial target is to add a simple print for every statement in the abstract syntax tree.
I saw that there is an AstVisitor
class , which allows you to traverse the AST.
so my question is how can i add a statement to the AST after the statement the visitor is currently visiting?
Ok, I'll summarize my experiments. First, what I write applies to V8 as it was used in Chromium version r157275, so things may not work any more - but I'll nevertheless link to the places in the current version.
As said, you need your own AST visitor, say MyAstVisior
, which inherits from AstVisitor
and must implement a bunch of VisitXYZ
methods from there. The only one needed to instrument/inspect executed code is VisitFunctionLiteral
. Executed code is either a function or a set of loose statements in a source (file) which V8 wraps in a function which is then executed.
Then, just before a parsed AST is converted to code, here (compilation of the function made form the loose statements) and there (compilation during runtime, when a predefined function is executed for the first time), you pass your visitor to the function literal, which will call VisitFunctionLiteral
on the visitor:
MyAstVisitor myAV(info);
info->function()->Accept(&myAV);
// next line is the V8 compile call
if (!MakeCode(info)) {
I passed the CompilationInfo
pointer info
to the custom visitor because one needs that to modify the AST. The constructor looks like this:
MyAstVisitor(CompilationInfo* compInfo) :
_ci(compInfo), _nf(compInfo->isolate(), compInfo->zone()), _z(compInfo->zone()){};
_ci, _nf and _z are pointers to CompilationInfo
, AstNodeFactory<AstNullVisitor>
and Zone
.
Now in VisitFunctionLiteral
you can iterate through the function body and also insert statements if you like.
void MyAstVisitor::VisitFunctionLiteral(FunctionLiteral* funLit){
// fetch the function body
ZoneList<Statement*>* body = funLit->body();
// create a statement list used to collect the instrumented statements
ZoneList<Statement*>* _stmts = new (_z) ZoneList<Statement*>(body->length(), _z);
// iterate over the function body and rewrite each statement
for (int i = 0; i < body->length(); i++) {
// the rewritten statements are put into the collector
rewriteStatement(body->at(i), _stmts);
}
// replace the original function body with the instrumented one
body->Clear();
body->AddAll(_stmts->ToVector(), _z);
}
In the rewriteStatement
method you now can inspect the statement. The _stmts
pointer holds a list of statements which in the end will replace the original function body. So to add a print statement after each statement you first add the original statement and then add your own print statement:
void MyAstVisitor::rewriteStatement(Statement* stmt, ZoneList<Statement*>* collector){
// add original statement
collector->Add(stmt, _z);
// create and add print statement, assuming you define print somewhere in JS:
// 1) create handle (VariableProxy) for print function
Vector<const char> fName("print", 5);
Handle<String> fNameStr = Isolate::Current()->factory()->NewStringFromAscii(fName, TENURED);
fNameStr = Isolate::Current()->factory()->SymbolFromString(fNameStr);
// create the proxy - (it is vital to use _ci->function()->scope(), _ci->scope() crashes)
VariableProxy* _printVP = _ci->function()->scope()->NewUnresolved(&_nf, fNameStr, Interface::NewUnknown(_z), 0);
// 2) create message
Vector<const char> tmp("Hello World!", 12);
Handle<String> v8String = Isolate::Current()->factory()->NewStringFromAscii(tmp, TENURED);
Literal* msg = _nf.NewLiteral(v8String);
// 3) create argument list, call expression, expression statement and add the latter to the collector
ZoneList<Expression*>* args = new (_z) ZoneList<Expression*>(1, _z);
args->Add(msg);
Call* printCall = _nf.NewCall(_printVP, args, 0);
ExpressionStatement* printStmt = _nf.NewExpressionStatement(printCall);
collector->Add(printStmt, _z);
}
The last parameter of NewCall
and NewUnresolved
is a number specifying the position in the script. I assume this is used for debug/error messages to tell where an error happened. I at least never encountered problems with setting it to 0 (there is also a constant somewhere kNoPosition).
Some final words: This will not actually add a print statement after each statement, because Blocks
(e.g. loop bodies) are statements that represent a list of statements and loops are statements that have a condition expression and a body block. So you would need to inspect what kind of statement currently is handled and recursively look into it. Rewriting blocks is pretty much the same as rewriting a function body.
But you will run into problems when you start to replace or modify existing statements, because the AST also carries information about branching. So if you replace a jump target for some condition you break your code. I guess this could be covered if one directly adds rewriting capabilities to the single expression and statement types instead of creating new ones to replace them.
So far, I hope it helps.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With