Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Illegal Opcodes in the JVM

I've recently come across while developing a library that performs operations on JVM bytecode some opcodes on which there is no documentation (that I've found), yet which are recognized by the JVM reference implementation. I've found a list of these, and they are:

BREAKPOINT = 202;
LDC_QUICK = 203;
LDC_W_QUICK = 204;
LDC2_W_QUICK = 205;
GETFIELD_QUICK = 206;
PUTFIELD_QUICK = 207;
GETFIELD2_QUICK = 208;
PUTFIELD2_QUICK = 209;
GETSTATIC_QUICK = 210;
PUTSTATIC_QUICK = 211;
GETSTATIC2_QUICK = 212;
PUTSTATIC2_QUICK = 213;
INVOKEVIRTUAL_QUICK = 214;
INVOKENONVIRTUAL_QUICK = 215;
INVOKESUPER_QUICK = 216;
INVOKESTATIC_QUICK = 217;
INVOKEINTERFACE_QUICK = 218;
INVOKEVIRTUALOBJECT_QUICK = 219;
NEW_QUICK = 221;
ANEWARRAY_QUICK = 222;
MULTIANEWARRAY_QUICK = 223;
CHECKCAST_QUICK = 224;
INSTANCEOF_QUICK = 225;
INVOKEVIRTUAL_QUICK_W = 226;
GETFIELD_QUICK_W = 227;
PUTFIELD_QUICK_W = 228;
IMPDEP1 = 254;
IMPDEP2 = 255;

They seem to be replacements for their other implementations, yet have different opcodes. After a long period of trawling page after page through Google, I came across a mention of the LDC*_QUICK opcodes in this document.

Quote from it on the LDC_QUICK opcode:

Operation Push item from constant pool

Forms ldc_quick = 203 (0xcb)

Stack ... ..., item

Description The index is an unsigned byte that must be a valid index into the constant pool of the current class (§3.6). The constant pool item at index must have already been resolved and must be one word wide. The item is fetched from the constant pool and pushed onto the operand stack.

Notes The opcode of this instruction was originally ldc. The operand of the ldc instruction is not modified.

Alright. Seemed interesting, and so I decided to try it out. LDC_QUICK seems to have the same format as LDC, so I proceeded into changing a LDC opcode to a LDC_QUICK one. This resulted in a failure, though the JVM obviously recognized it. After attempting to run the modified file, the JVM crashed with the following output:

Exception in thread "main" java.lang.VerifyError: Bad instruction: cc
Exception Details:
  Location:
    Test.main([Ljava/lang/String;)V @9: fast_bgetfield
  Reason:
    Error exists in the bytecode
  Bytecode:
    0000000: bb00 0559 b700 064c 2bcc 07b6 0008 572b
    0000010: b200 09b6 000a 5710 0ab8 000b 08b8 000c
    0000020: 8860 aa00 0000 0032 0000 0001 0000 0003
    0000030: 0000 001a 0000 0022 0000 002a b200 0d12
    0000040: 0eb6 000f b200 0d12 10b6 000f b200 0d12
    0000050: 11b6 000f bb00 1259 2bb6 0013 b700 14b8
    0000060: 0015 a700 104d 2cb6 0016 b200 0d12 17b6
    0000070: 000f b1
  Exception Handler Table:
    bci [84, 98] => handler: 101
  Stackmap Table:
    append_frame(@60,Object[#41])
    same_frame(@68)
    same_frame(@76)
    same_frame(@84)
    same_locals_1_stack_item_frame(@101,Object[#42])
    same_frame(@114)

        at java.lang.Class.getDeclaredMethods0(Native Method)
        at java.lang.Class.privateGetDeclaredMethods(Unknown Source)
        at java.lang.Class.getMethod0(Unknown Source)
        at java.lang.Class.getMethod(Unknown Source)
        at sun.launcher.LauncherHelper.validateMainClass(Unknown Source)
        at sun.launcher.LauncherHelper.checkAndLoadMain(Unknown Source)

The above error gives mixed messages. Obviously, class file verification failed: java.lang.VerifyError: Bad instruction: cc. At the same time, the JVM recognized the opcode: @9: fast_bgetfield. Additionally, it seems to think that it is a different instruction, because fast_bgetfield does not imply constant pushing...

I think its fair to say I am quite confused. What are these illegal opcodes? Do JVM's run them? Why am I receiving VerifyErrors? Deprecation? And do they have an advantage over their documented counterparts?

Any insight would be greatly appreciated.

like image 940
Xyene Avatar asked Feb 11 '13 23:02

Xyene


People also ask

How many opcodes are in JVM?

Java bytecode is the instruction set of the Java virtual machine. Each bytecode is composed of one, or in some cases two bytes that represent the instruction (opcode), along with zero or more bytes for passing parameters. Currently, in Jan 2017, there are 205 opcodes in use out of 256 possible byte-long opcodes.

What is opcode in JVM?

A Java Virtual Machine instruction consists of an opcode specifying the operation to be performed, followed by zero or more operands embodying values to be operated upon. This chapter gives details about the format of each Java Virtual Machine instruction and the operation it performs.

How many Java opcodes are there?

Currently, the Java Virtual Machine is constrained to a single byte of instructions. This leaves a mere 256 opcodes for implementations to work with. Of these, approximately 202 are already in use and 3 are reserved for special use cases, leaving just 55 remaining for use in the future.

Does JVM have registers?

The JVM has a program counter and three registers that manage the stack. It has few registers because the bytecode instructions of the JVM operate primarily on the stack. This stack-oriented design helps keep the JVM's instruction set and implementation small.


2 Answers

The first edition of the Java virtual machine specification described a technique used by one of Sun's early implementations of the Java virtual machine to speed up the interpretation of bytecodes. In this scheme, opcodes that refer to constant pool entries are replaced by a "_quick" opcode when the constant pool entry is resolved. When the virtual machine encounters a _quick instruction, it knows the constant pool entry is already resolved and can therefore execute the instruction faster.

The core instruction set of the Java virtual machine consists of 200 single-byte opcodes. These 200 opcodes are the only opcodes you will ever see in class files. Virtual machine implementations that use the "_quick" technique use another 25 single-byte opcodes internally, the "_quick" opcodes.

For example, when a virtual machine that uses the _quick technique resolves a constant pool entry referred to by an ldc instruction (opcode value 0x12), it replaces the ldc opcode byte in the bytecode stream with an ldc_quick instruction (opcode value 0xcb). This technique is part of the process of replacing a symbolic reference with a direct reference in Sun's early virtual machine.

For some instructions, in addition to overwriting the normal opcode with a _quick opcode, a virtual machine that uses the _quick technique overwrites the operands of the instruction with data that represents the direct reference. For example, in addition to replacing an invokevirtual opcode with an invokevirtual_quick, the virtual machine also puts the method table offset and the number of arguments into the two operand bytes that follow every invokevirtual instruction. Placing the method table offset in the bytecode stream following the invokevirtual_quick opcode saves the virtual machine the time it would take to look up the offset in the resolved constant pool entry.

Chapter 8 of Inside the Java Virtual Machine

Basically you cannot just put the opcode in the class file. Only the JVM can do that after it resolves the operands.

like image 122
jdb Avatar answered Oct 30 '22 03:10

jdb


I don't know about all of the opcodes you have listed, but three of them—breakpoint, impdep1, and impdep2—are reserved opcodes documented in Section 6.2 of the Java Virtual Machine Specification. It says, in part:

Two of the reserved opcodes, numbers 254 (0xfe) and 255 (0xff), have the mnemonics impdep1 and impdep2, respectively. These instructions are intended to provide "back doors" or traps to implementation-specific functionality implemented in software and hardware, respectively. The third reserved opcode, number 202 (0xca), has the mnemonic breakpoint and is intended to be used by debuggers to implement breakpoints.

Although these opcodes have been reserved, they may be used only inside a Java virtual machine implementation. They cannot appear in valid class files. . . .

I suspect (from their names) that the rest of the other opcodes are part of the JIT mechanism and also cannot appear in a valid class file.

like image 36
Ted Hopp Avatar answered Oct 30 '22 04:10

Ted Hopp