I would like to extract from a given Scala project, the call graph of all methods which are part of the project's own source.
As I understand, the presentation compiler doesn't enable that, and it requires going down all the way down to the actual compiler (or a compiler plugin?).
Can you suggest complete code, that would safely work for most scala projects but those that use the wackiest dynamic language features? for the call graph, I mean a directed (possibly cyclic) graph comprising class/trait + method
vertices where an edge A -> B indicates that A may call B.
Calls to/from libraries should be avoided or "marked" as being outside the project's own source.
EDIT:
See my macro paradise derived prototype solution, based on @dk14's lead, as an answer below. Hosted on github at https://github.com/matanster/sbt-example-paradise.
Here's the working prototype, which prints the necessary underlying data to the console as a proof of concept. http://goo.gl/oeshdx.
I have adapted the concepts from @dk14 on top boilerplate from macro paradise.
Macro paradise lets you define an annotation that will apply your macro over any annotated object in your source code. From there you have access to the AST that the compiler generates for the source, and scala reflection api can be used to explore the type information of the AST elements. Quasiquotes (the etymology is from haskell or something) are used to match the AST for the relevant elements.
The generally important thing to note is that quasiquotes work over an AST, but they are a strange-at-first-glance api and not a direct representation of the AST (!). The AST is picked up for you by paradise's macro annotation, and then quasiquotes are the tool for exploring the AST at hand: you match, slice and dice the abstract syntax tree using quasiquotes.
The practical thing to note about quasiquotes is that there are fixed quasiquote templates for matching each type of scala AST - a template for a scala class definition, a template for a scala method definition, etc. These tempaltes are all provided here, making it very simple to match and deconstruct the AST at hand to its interesting constituents. While the templates may look daunting at first glance, they are mostly just templates mimicking the scala syntax, and you may freely change the $
prepended variable names within them to names that feel nicer to your taste.
I still need to further hone the quasiquote matches I use, which currently aren't perfect. However, my code seems to produce the desired result for many cases, and honing the matches to 95% precision may be well doable.
found class B
class B has method doB
found object DefaultExpander
object DefaultExpander has method foo
object DefaultExpander has method apply
which calls Console on object scala of type package scala
which calls foo on object DefaultExpander.this of type object DefaultExpander
which calls <init> on object new A of type class A
which calls doA on object a of type class A
which calls <init> on object new B of type class B
which calls doB on object b of type class B
which calls mkString on object tags.map[String, Seq[String]](((tag: logTag) => "[".+(Util.getObjectName(tag)).+("]")))(collection.this.Seq.canBuildFrom[String]) of type trait Seq
which calls map on object tags of type trait Seq
which calls $plus on object "[".+(Util.getObjectName(tag)) of type class String
which calls $plus on object "[" of type class String
which calls getObjectName on object Util of type object Util
which calls canBuildFrom on object collection.this.Seq of type object Seq
which calls Seq on object collection.this of type package collection
.
.
.
It is easy to see how callers and callees can be correlated from this data, and how call targets outside the project's source can be filtered or marked out. This is all for scala 2.11. Using this code, one will need to prepend an annotation to each class/object/etc in each source file.
The challenges that remain are mostly:
acyclic brings to mind a quite opposite approach that still sticks to the realm of the scala compiler - it inspects all symbols generated for the source, by the compiler (as much as I gather from the source). What it does is check for cyclic references (see the repo for a detailed definition). Each symbol supposedly has enough information attached to it, to derive the graph of references that acyclic needs to generate.
A solution inspired by this approach may, if feasible, locate the parent "owner" of every symbol rather than focus on the graph of source files connections as acyclic itself does. Thus with some effort it would recover the class/object ownership of each method. Not sure if this design would not computationally explode, nor how to deterministically obtain the class encompassing each symbol.
The upside would be that there is no need for macro annotations here. The downside is that this cannot sprinkle runtime instrumentation as the macro paradise rather easily allows, which could be at times useful.
It requires more precise analysis, but as a start this simple macro will print all possible applyies, but it requires macro-paradise and all traced classess should have @trace
annotation:
class trace extends StaticAnnotation {
def macroTransform(annottees: Any*) = macro tracerMacro.impl
}
object tracerMacro {
def impl(c: Context)(annottees: c.Expr[Any]*): c.Expr[Any] = {
import c.universe._
val inputs = annottees.map(_.tree).toList
def analizeBody(name: String, method: String, body: c.Tree) = body.foreach {
case q"$expr(..$exprss)" => println(name + "." + method + ": " + expr)
case _ =>
}
val output = inputs.head match {
case q"class $name extends $parent { ..$body }" =>
q"""
class $name extends $parent {
..${
body.map {
case x@q"def $method[..$tt] (..$params): $typ = $body" =>
analizeBody(name.toString, method.toString, body)
x
case x@q"def $method[..$tt]: $typ = $body" =>
analizeBody(name.toString, method.toString, body)
x
}
}
}
"""
case x => sys.error(x.toString)
}
c.Expr[Any](output)
}
}
Input:
@trace class MyF {
def call(param: Int): Int = {
call2(param)
if(true) call3(param) else cl()
}
def call2(oaram: Int) = ???
def cl() = 5
def call3(param2: Int) = ???
}
Output (as compiler's warnings, but you may output to file intead of println):
Warning:scalac: MyF.call: call2
Warning:scalac: MyF.call: call3
Warning:scalac: MyF.call: cl
Of course, you might want to c.typeCheck(input)
it (as now expr.tpe
on found trees is equals null
) and find which class this calling method belongs to actually, so the resulting code may not be so trivial.
P.S. macroAnnotations give you unchecked tree (as it's on earlier compiler stage than regular macroses), so if you want something typechecked - the best way is surround the piece of code you want to typecheck with call of some your regular macro, and process it inside this macro (you can even pass some static parameters). Every regular macro inside tree produced by macro-annotation - will be executed as usual.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With