Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java - How to overcome the maximum method size in automatically-generated code

I have an unusual requirement: My application generates Java code automatically from a very long script (written in a dynamically-typed language). The script is so long that I hit the maximum method size of 65k of the JVM.

The script consists only of simple instructions on primitive types (no call to other functions besides mathematical ones). It may look like:

...
a = b * c + sin(d)
...
if a>10 then
   e = a * 2
else
   e = a * abs(b)
end
...

... which is transformed as:

...
double a = b * c + Math.sin(d);
...
double e;
if(a>10){
   e = a * 2;
}else{
   e = a * Math.abs(b);
}
...


My first idea to overcome the method size limitation was the following:
  • Turn all local variables into fields
  • Split the code every 100 lines (or longer if needed in case of a if/else block) in separate methods.

Something like:

class AutoGenerated {

   double a,b,c,d,e,....;

   void run1(){
      ...
      a = b * c + sin(d);
      ...
      run2();
   }

   void run2(){
      ...
      if(a>10){
          e = a * 2;
      }else{
          e = a * Math.abs(b);
      }
      ...
      run3();
   }

   ...
}

Do you know of any other way that would be more efficient? Note that I need the code to run as fast as possible as it will be executed in long loops. I cannot resort to compiling to C, as interoperability is also an issue...

I would also appreciate pointers to libraries that might help me.

like image 246
Eric Leibenguth Avatar asked Aug 25 '15 10:08

Eric Leibenguth


2 Answers

We are using the similar approach in one of the projects despite its disadvantages mentioned by other people. We call the multiple generated methods from single launcher method like @Marco13 suggests. We actually calculate (pretty precisely) the size of the generated bytecode and start a new method only when limit is reached. Our math formulas which we translate to Java code are available as AstTree and we have a special visitor which counts the bytecode length per each expression. For such simple programs it's quite stable across Java versions and different compilers. So we don't create methods more than necessary. In our case it's quite hard to emit the bytecode directly, but you can try to do it for your language using ASM or similar library (this way, of course, ASM will calculate the bytecode length for you).

We usually store the data variables in single double[] array (we don't need other types) and pass it as parameter. This way you don't need the enormous number of fields (sometimes we have thousands of variables). On the other hand local array access may take more bytecode bytes compared to field access for index higher than 127.

Another concern is the constant pool size. We usually have many double constants in the autogenerated code. If you declare many fields and/or methods, their names also take the constant pool entries. So it's possible to hit the class constant pool limit. Sometimes we hit it and generate nested classes to overcome this problem.

Other people suggest to tweak JVM options as well. Use these advices carefully as they will affect not only this autogenerated class, but every other class as well (I assume that other code is also executed in the same JVM in your case).

like image 159
Tagir Valeev Avatar answered Nov 14 '22 18:11

Tagir Valeev


Converting the local variables into fields may actually have a negative impact on the performance as long as the code is not optimized by the JIT (see this question and related ones for further infos). But I see that, depending on the variables this involves, there may hardly be other feasible options.


There may be additional limits for the compilation and method sizes. Peter Lawrey mentioned in the comments that "...methods over 8 KB in size are not compiled by default" - I wasn't aware of this one, but he usually knows what he's talking about, so you should dig a bit deeper here. Additionally, wou might want to have a look at the HotSpot VM options to see which further limits and settings might be relevant for you. I primarily thought that

-XX:MaxInlineSize=35 : Maximum bytecode size of a method to be inlined.

might be something to keep in mind.

(In fact, calling so many methods with a size of MaxInlineSize that inlining all these calls will exceed the size of 65k bytes for the containing method may be a neatly-nasty test case for the robustness and edge case tests of the inlining process...)


You sketched a "telescoping" call scheme for the methods:

void run1(){
   ...
   run2();
}

void run2(){
   ...
   run3();
}

This may lead into problems as well: Considering that you have >650 of these methods (in the best case), this will at least lead to a very deep stack, and might in fact cause a StackOverflowError - again, depending on some of the Memory Options. You might have to increase the Stack Size by setting the -Xss parameter accordingly.


The actual problem description was a bit vague, and without further information about the code that is to be generated (also regarding questions about e.g. how many local variables you need, that might have to be turned into instance variables etc), I'd propose the following:

  • Create many small methods if possible (considering the MaxInlineSize)
  • Try to reuse these small methods (if such reusability can be detected from the input with reasonable effort)
  • Call these methods sequentially, as in

    void run()
    {
        run0();
        run1();
        ...
        run2000();
    }
    

    to avoid problems with the stack size.


However, if you added further examples or details, one could probably give more focussed advice. This could even be a "complete" example - not necessarily involving thousands of lines of codes, but showing the actual patterns that appear there.

like image 41
Marco13 Avatar answered Nov 14 '22 18:11

Marco13