Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ifeq/ifne JVM opcode always branches

[TL;DR: the following JVM bytecode instructions seems not to work:

iconst_0
istore 6
...sequential
iinc 6 1
jsr L42
...
; L42
iload 6
ifeq L53 ; Always branches!!!
astore 8
iinc 6 -1
; L53
LDC 100
ISUB     ; ERROR, returnAddress is at the top of the stack

A test .class can be found here (with slightly more complex logic). If you want to know more about why I'm seeing these instructions, please keep reading.]

I'm writing a Whitespace compiler targeting JVM bytecode. Although being an esoteric language, Whitespace describes an interesting set of assembly instructions to a stack machine, which maps nicely to the JVM.

Whitespace has labels, which are both targets for jump (goto/jump-if-zero/jump-if-negative) and function calls. The relevant instructions (with names given by me, in the spec they are given as combinations of space, tabs and newlines) are:

  • mark <label>: sets a label for the following instruction
  • jump[-if-neg|-if-zero] <label>: jumps unconditionally or conditionally to the given label
  • call <label>: call a function pointed by label
  • end <label>: ends a function, returning to the caller.

My compiler outputs the whole Whitespace program in the main method of a class. The simplest way to implement call and end is using the JSR and RET opcodes, which are made to implement subroutines. After a JSR operation the stack will contain a returnAddress reference that should be stored in a variable for later use in end.

However, as mark can be either call-ed or jump-ed into, the stack may or may not contain the returnAddress reference. I decided to use a boolean variable (call-bit, at address 6) to store how the mark was reached, and then test if it should store the top of the stack into a local variable (return-address, at address 8). The implementation for each instruction is as follows:

; ... initialization
iconst_0
istore 6 ; local variable #6 holds the call bit

# call
iinc 6 1 ; sets the call bit
jsr Lxxx ; jumps to the given label, pushing a returnAddress to the stack

# mark
; Lxxx
iload 6       ; loads the call bit
ifeq Lxxx-end ; SHOULD jump to mark's end if the call bit is not set
; call bit is set: mark was call-ed and returnAddress is in the stack
astore 8      ; stores returnAddress to local variable #8
iinc 6 -1     ; resets the call bit
; Lxxx-end

# end
ret 8 ; returns using the stored returnAddress

The problem: ifeq ALWAYS branches. I also tried reversing the logic (call-bit -> jump-bit, ifeq->ifne), and even simply switching to ifne (which would be wrong)... but the if always branches to the end. After a call, the returnAddress stays in the stack and the next operation blows up.

I've used ASM's analyzer to watch the stack to debug all this, but have just asserted this behavior and can't find what I'm doing wrong. My one suspicion is that there's more to iinc, or to ifeq than my vain philosophy can imagine. I'll admit that I've only read the instruction set page and ASM's pertinent documentation for this project, but I'd hope that someone can bring a solution from the top of their mind.

In this folder there are the relevant files, including the executable class and the original Whitespace, as well as the output of javap -c and ASM analysis.

like image 383
Bruno Kim Avatar asked Nov 09 '22 16:11

Bruno Kim


1 Answers

Found a possible reason: the problem is not during execution, but with the verifier. When it seemed that it "always branched", was in fact the verifier testing all possible outcomes of an if so it could be sure the stack would look like the same. My code relies on a reference (returnAddress) maybe or maybe not being present on the stack and the verifier can't check that.

That said, the sample code does not run with the -noverify flag, but other, simpler examples that failed verification did execute correctly.

like image 129
Bruno Kim Avatar answered Nov 14 '22 21:11

Bruno Kim