Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Best practice for detecting changes to functions in Scala programs?

I'm working on a Scala-based script language (internal DSL) that allows users to define multiple data transformations functions in a Scala script file. Since the application of these functions could take several hours I would like to cache the results in a database. Users are allowed to change the definition of the transformation functions and also to add new functions. However, then the user restarts the application with a slightly modified script I would like to execute only those functions that have been changed or added. The question is how to detect those changes? For simplicity let us assume that the user can only adapt the script file so that any reference to something not defined in this script can be assumed to be unchanged.

In this case what's the best practice for detecting changes to such user-defined functions?

Until now I though about:

  • parsing the script file and calculating fingerprints based on the source code of the function definitions
  • getting the bytecode of each function at runtime and building fingerprints based on this data
  • applying the functions to some test data and calculating fingerprints on the results

However, all three approaches have their pitfalls.

  • Writing a parser for Scala to extract the function definitions could be quite some work, especially if you want to detect changes that indirectly affect the behaviour of your functions (e.g. if your function calls another (changed) function defined in the script).
  • The bytecode analysis could be another option, but I never worked with those libraries. Thus I have no idea if they can solve my problem and how they deal with Java's dynamic binding.
  • The approach with example data is definitely the simplest one, but has the drawback that different user-defined functions could be accidentally mapped to the same fingerprint if they return the same results for my test data.

Does someone has experience with one of these "solutions" or can suggest me a better one?

like image 520
Stefan Endrullis Avatar asked Sep 23 '11 15:09

Stefan Endrullis


1 Answers

The second option doesn't look difficult. For example, with Javassist library obtaining bytecode of a method is as simple as

CtClass c = ClassPool.getDefault().get(className);
for (CtMethod m: c.getDeclaredMethod()) {
    CodeAttribute ca = m.getMethodInfo().getCodeAttribute();
    if (ca != null) { // i.e. if the method is not native
        byte[] byteCode = ca.getCode();
        ...
    }
}

So, as long as you assume that results of your methods depend on the code of that methods only, it's pretty straighforward.

UPDATE: On the other hand, since your methods are written in Scala, they probably contain some closures, so that parts of their code reside in anonymous classes, and you may need to trace usage of these classes somehow.

like image 81
axtavt Avatar answered Sep 20 '22 13:09

axtavt