Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Force tableswitch instead of lookupswitch

Scala 2.11 compiles a match expression over a relatively dense Int range into a lookupswitch:

lookupswitch { // 21
    -12: 200
    -11: 200
    -10: 184
     -9: 190
     -8: 190
     -7: 190
     -6: 190
     -5: 190
     -4: 200
     -1: 200
      2: 195
      3: 195
      4: 195
      5: 195
      6: 184
      7: 184
     12: 184
     13: 184
     18: 184
     21: 184
     25: 184
default: 180
}

Whereas Java 7 compiles the equivalent switch statement into a tableswitch:

tableswitch { // -12 to 25
    -12: 168
    -11: 168
    -10: 177
     -9: 174
     -8: 174
     -7: 174
     -6: 174
     -5: 174
     -4: 168
     -3: 185
     -2: 185
     -1: 168
      0: 185
      1: 185
      2: 171
      3: 171
      4: 171
      5: 171
      6: 177
      7: 177
      8: 185
      9: 185
     10: 185
     11: 185
     12: 181
     13: 181
     14: 185
     15: 185
     16: 185
     17: 185
     18: 181
     19: 185
     20: 185
     21: 181
     22: 185
     23: 185
     24: 185
     25: 181
default: 185
}

Is there some way to force Scala into generating a tableswitch as well?

like image 801
fredoverflow Avatar asked Jul 08 '17 17:07

fredoverflow


1 Answers

You should not care about the bytecode, since modern JVMs are smart enough to compile both lookupswitch and tableswitch in a similarly efficient way.

Intuitively tableswitch should be faster, and this is also suggested by JVM specification:

Thus, a tableswitch instruction is probably more efficient than a lookupswitch where space considerations permit a choice.

However, the spec was written 20+ years ago, when JVM had no JIT compiler. Is there a performance difference in a modern HotSpot JVM?

The benchmark

package bench;

import org.openjdk.jmh.annotations.*;

@State(Scope.Benchmark)
public class SwitchBench {
    @Param({"1", "2", "3", "4", "5", "6", "7", "8"})
    int n;

    @Benchmark
    public long lookupSwitch() {
        return Switch.lookupSwitch(n);
    }

    @Benchmark
    public long tableSwitch() {
        return Switch.tableSwitch(n);
    }
}

To have precise control over the bytecode, I build Switch class with Jasmin.

.class public bench/Switch
.super java/lang/Object

.method public static lookupSwitch(I)I
    .limit stack 1

    iload_0
    lookupswitch
      1 : One
      2 : Two
      3 : Three
      4 : Four
      5 : Five
      6 : Six
      7 : Seven
      default : Other

One:
    bipush 11
    ireturn
Two:
    bipush 22
    ireturn
Three:
    bipush 33
    ireturn
Four:
    bipush 44
    ireturn
Five:
    bipush 55
    ireturn
Six:
    bipush 66
    ireturn
Seven:
    bipush 77
    ireturn
Other:
    bipush -1
    ireturn
.end method

.method public static tableSwitch(I)I
    .limit stack 1

    iload_0
    tableswitch 1
      One
      Two
      Three
      Four
      Five
      Six
      Seven
      default : Other

One: 
    bipush 11
    ireturn
Two:
    bipush 22
    ireturn
Three:
    bipush 33
    ireturn
Four:
    bipush 44
    ireturn
Five:
    bipush 55
    ireturn
Six:
    bipush 66
    ireturn
Seven:
    bipush 77
    ireturn
Other:
    bipush -1
    ireturn
.end method

The results show no performance difference between lookupswitch/tableswitch benchmarks, but there is a small variation depending on switch argument:

lookupswitch vs. tableswitch performance

Assembly

Let's verify the theory by looking at generated assembly code.
The following JVM option will help: -XX:CompileCommand=print,bench.Switch::*

  # {method} {0x0000000017498a48} 'lookupSwitch' '(I)I' in 'bench/Switch'
  # parm0:    rdx       = int
  #           [sp+0x20]  (sp of caller)
  0x000000000329b240: sub    $0x18,%rsp
  0x000000000329b247: mov    %rbp,0x10(%rsp)    ;*synchronization entry
                                                ; - bench.Switch::lookupSwitch@-1

  0x000000000329b24c: cmp    $0x4,%edx
  0x000000000329b24f: je     0x000000000329b2a5
  0x000000000329b251: cmp    $0x4,%edx
  0x000000000329b254: jg     0x000000000329b281
  0x000000000329b256: cmp    $0x2,%edx
  0x000000000329b259: je     0x000000000329b27a
  0x000000000329b25b: cmp    $0x2,%edx
  0x000000000329b25e: jg     0x000000000329b273
  0x000000000329b260: cmp    $0x1,%edx
  0x000000000329b263: jne    0x000000000329b26c  ;*lookupswitch
                                                 ; - bench.Switch::lookupSwitch@1
  ...

What we see here is a binary search starting with a mid value 4 (this explains why case 4 has the best performance in the graph above).

But the interesting thing is that tableSwitch is compiled exactly the same way!

  # {method} {0x0000000017528b18} 'tableSwitch' '(I)I' in 'bench/Switch'
  # parm0:    rdx       = int
  #           [sp+0x20]  (sp of caller)
  0x000000000332c280: sub    $0x18,%rsp
  0x000000000332c287: mov    %rbp,0x10(%rsp)    ;*synchronization entry
                                                ; - bench.Switch::tableSwitch@-1

  0x000000000332c28c: cmp    $0x4,%edx
  0x000000000332c28f: je     0x000000000332c2e5
  0x000000000332c291: cmp    $0x4,%edx
  0x000000000332c294: jg     0x000000000332c2c1
  0x000000000332c296: cmp    $0x2,%edx
  0x000000000332c299: je     0x000000000332c2ba
  0x000000000332c29b: cmp    $0x2,%edx
  0x000000000332c29e: jg     0x000000000332c2b3
  0x000000000332c2a0: cmp    $0x1,%edx
  0x000000000332c2a3: jne    0x000000000332c2ac  ;*tableswitch
                                                 ; - bench.Switch::tableSwitch@1
  ...

Jump table

But wait... Why binary search, not a jump table?

HotSpot JVM has an heuristics to generate a jump table only for switches with 10+ cases. This can be altered by the option -XX:MinJumpTableSize=.

OK, let's extend our test case with some more labels and check the generated code once again.

  # {method} {0x0000000017288a68} 'lookupSwitch' '(I)I' in 'bench/Switch'
  # parm0:    rdx       = int
  #           [sp+0x20]  (sp of caller)
  0x000000000307ecc0: sub    $0x18,%rsp         ;   {no_reloc}
  0x000000000307ecc7: mov    %rbp,0x10(%rsp)    ;*synchronization entry
                                                ; - bench.Switch::lookupSwitch@-1

  0x000000000307eccc: mov    %edx,%r10d
  0x000000000307eccf: dec    %r10d
  0x000000000307ecd2: cmp    $0xa,%r10d
  0x000000000307ecd6: jb     0x000000000307ece9
  0x000000000307ecd8: mov    $0xffffffff,%eax
  0x000000000307ecdd: add    $0x10,%rsp
  0x000000000307ece1: pop    %rbp
  0x000000000307ece2: test   %eax,-0x1faece8(%rip)        # 0x00000000010d0000
                                                ;   {poll_return}
  0x000000000307ece8: retq   
  0x000000000307ece9: movslq %edx,%r10
  0x000000000307ecec: movabs $0x307ec60,%r11    ;   {section_word}
  0x000000000307ecf6: jmpq   *-0x8(%r11,%r10,8)  ;*lookupswitch
                                                ; - bench.Switch::lookupSwitch@1
                      ^^^^^^^^^^^^^^^^^^^^^^^^^

Yes! Here is our computed jump instruction. Note, this is generated for lookupswitch. But there will be exactly the same one for tableswitch.

Amazingly, HotSpot JVM can generate jump tables even for switches with gaps and with outliers. -XX:MaxJumpTableSparseness controls how large the gaps can be. E.g. if there are labels from 1 to 10, then from 13 to 20 and the last label with the value 99 - JIT will generate a guard test for the value 99, and for the rest labels it will create a table.

Source code

HotSpot source code will finally convince there should be no performance difference between lookupswitch and tableswitch after a method is JIT-compiled with C2. That's basically because the parsing of both instructions ends up with a call to the same jump_switch_ranges function that works for an arbitrary set of labels.

Conclusion

As we saw, HotSpot JVM may compile tableswitch using a binary search and lookupswitch using a jump table, or vice versa. This depends on the number and the density of labels, but not on the bytecode itself.

So, answering your original question - you don't need to!

like image 168
apangin Avatar answered Oct 03 '22 20:10

apangin