Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extracting the complete call graph of a scala project (tough one)

I would like to extract from a given Scala project, the call graph of all methods which are part of the project's own source.

As I understand, the presentation compiler doesn't enable that, and it requires going down all the way down to the actual compiler (or a compiler plugin?).

Can you suggest complete code, that would safely work for most scala projects but those that use the wackiest dynamic language features? for the call graph, I mean a directed (possibly cyclic) graph comprising class/trait + method vertices where an edge A -> B indicates that A may call B.

Calls to/from libraries should be avoided or "marked" as being outside the project's own source.

EDIT:

See my macro paradise derived prototype solution, based on @dk14's lead, as an answer below. Hosted on github at https://github.com/matanster/sbt-example-paradise.

like image 558
matanster Avatar asked Apr 22 '15 00:04

matanster


2 Answers

Here's the working prototype, which prints the necessary underlying data to the console as a proof of concept. http://goo.gl/oeshdx.

How This Works

I have adapted the concepts from @dk14 on top boilerplate from macro paradise.

Macro paradise lets you define an annotation that will apply your macro over any annotated object in your source code. From there you have access to the AST that the compiler generates for the source, and scala reflection api can be used to explore the type information of the AST elements. Quasiquotes (the etymology is from haskell or something) are used to match the AST for the relevant elements.

More about Quasiquotes

The generally important thing to note is that quasiquotes work over an AST, but they are a strange-at-first-glance api and not a direct representation of the AST (!). The AST is picked up for you by paradise's macro annotation, and then quasiquotes are the tool for exploring the AST at hand: you match, slice and dice the abstract syntax tree using quasiquotes.

The practical thing to note about quasiquotes is that there are fixed quasiquote templates for matching each type of scala AST - a template for a scala class definition, a template for a scala method definition, etc. These tempaltes are all provided here, making it very simple to match and deconstruct the AST at hand to its interesting constituents. While the templates may look daunting at first glance, they are mostly just templates mimicking the scala syntax, and you may freely change the $ prepended variable names within them to names that feel nicer to your taste.

I still need to further hone the quasiquote matches I use, which currently aren't perfect. However, my code seems to produce the desired result for many cases, and honing the matches to 95% precision may be well doable.

Sample Output

found class B
class B has method doB
found object DefaultExpander
object DefaultExpander has method foo
object DefaultExpander has method apply
  which calls Console on object scala of type package scala
  which calls foo on object DefaultExpander.this of type object DefaultExpander
  which calls <init> on object new A of type class A
  which calls doA on object a of type class A
  which calls <init> on object new B of type class B
  which calls doB on object b of type class B
  which calls mkString on object tags.map[String, Seq[String]](((tag: logTag) => "[".+(Util.getObjectName(tag)).+("]")))(collection.this.Seq.canBuildFrom[String]) of type trait Seq
  which calls map on object tags of type trait Seq
  which calls $plus on object "[".+(Util.getObjectName(tag)) of type class String
  which calls $plus on object "[" of type class String
  which calls getObjectName on object Util of type object Util
  which calls canBuildFrom on object collection.this.Seq of type object Seq
  which calls Seq on object collection.this of type package collection
  .
  .
  .

It is easy to see how callers and callees can be correlated from this data, and how call targets outside the project's source can be filtered or marked out. This is all for scala 2.11. Using this code, one will need to prepend an annotation to each class/object/etc in each source file.

The challenges that remain are mostly:

Challenges remaining:

  1. This crashes after getting the job done. Hinging on https://github.com/scalamacros/paradise/issues/67
  2. Need to find a way to ultimately apply the magic to entire source files without manually annotating each class and object with the static annotation. This is rather minor for now, and admittedly, there are benefits for being able to control classes to include and ignore anyway. A preprocessing stage that implants the annotation before (almost) every top level source file definition, would be one nice solution.
  3. Honing the matchers such that all and only relevant definitions are matched - to make this general and solid beyond my simplistic and cursory testing.

Alternative Approach to Ponder

acyclic brings to mind a quite opposite approach that still sticks to the realm of the scala compiler - it inspects all symbols generated for the source, by the compiler (as much as I gather from the source). What it does is check for cyclic references (see the repo for a detailed definition). Each symbol supposedly has enough information attached to it, to derive the graph of references that acyclic needs to generate.

A solution inspired by this approach may, if feasible, locate the parent "owner" of every symbol rather than focus on the graph of source files connections as acyclic itself does. Thus with some effort it would recover the class/object ownership of each method. Not sure if this design would not computationally explode, nor how to deterministically obtain the class encompassing each symbol.

The upside would be that there is no need for macro annotations here. The downside is that this cannot sprinkle runtime instrumentation as the macro paradise rather easily allows, which could be at times useful.

like image 145
matanster Avatar answered Nov 09 '22 23:11

matanster


It requires more precise analysis, but as a start this simple macro will print all possible applyies, but it requires macro-paradise and all traced classess should have @trace annotation:

class trace extends StaticAnnotation { 
  def macroTransform(annottees: Any*) = macro tracerMacro.impl
}

object tracerMacro {

  def impl(c: Context)(annottees: c.Expr[Any]*): c.Expr[Any] = {

    import c.universe._
    val inputs = annottees.map(_.tree).toList
    def analizeBody(name: String, method: String, body: c.Tree) = body.foreach {
      case q"$expr(..$exprss)" => println(name + "." + method + ": " + expr)
      case _ =>
    }


    val output = inputs.head match {
      case q"class $name extends $parent { ..$body }" =>
        q"""
            class $name extends $parent {
              ..${
                   body.map {
                       case x@q"def $method[..$tt] (..$params): $typ = $body" =>
                         analizeBody(name.toString, method.toString, body)
                         x

                       case x@q"def $method[..$tt]: $typ = $body" =>
                         analizeBody(name.toString, method.toString, body)
                        x

                   }
                 }
            }
          """
      case x => sys.error(x.toString)
    }




    c.Expr[Any](output)
  }
}

Input:

  @trace class MyF {
    def call(param: Int): Int = {
      call2(param)
      if(true) call3(param) else cl()
    }
    def call2(oaram: Int) = ???
    def cl() = 5
    def call3(param2: Int) = ???
  }

Output (as compiler's warnings, but you may output to file intead of println):

  Warning:scalac: MyF.call: call2
  Warning:scalac: MyF.call: call3
  Warning:scalac: MyF.call: cl

Of course, you might want to c.typeCheck(input) it (as now expr.tpe on found trees is equals null) and find which class this calling method belongs to actually, so the resulting code may not be so trivial.

P.S. macroAnnotations give you unchecked tree (as it's on earlier compiler stage than regular macroses), so if you want something typechecked - the best way is surround the piece of code you want to typecheck with call of some your regular macro, and process it inside this macro (you can even pass some static parameters). Every regular macro inside tree produced by macro-annotation - will be executed as usual.

like image 4
dk14 Avatar answered Nov 10 '22 00:11

dk14