Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

what is jmpl instruction in x86?

x86 assembly design has instruction suffix, such as l(long), w(word), b(byte).
So I thought that jmpl to be long jmp

But it worked quite weird when I assemble it:

Test1 jmp: assembly source, and disassembly

main:
  jmp main

eb fe     jmp 0x0804839b <main> 

Test2 jmpl: assembly source, and disassembly

main:
  jmpl main       # added l suffix

ff 25 9b 83 04 08   jmp *0x0804839b

Compared to Test1, Test2 result is unexpected.
I think it should be assembled the same as Test1.


Question:
Is jmpl some different instruction in 8086 design?
(according to here, jmpl in SPARC means jmp link. is it something like this?)

...Or is this just a bug in GNU assembler?

like image 294
Jiwon Avatar asked Jan 27 '19 09:01

Jiwon


People also ask

What does x86 instruction set do?

The x86 instruction set refers to the set of instructions that x86-compatible microprocessors support. The instructions are usually part of an executable program, often stored as a computer file and executed on the processor.

What does JMP mean in assembly?

In the x86 assembly language, the JMP instruction performs an unconditional jump. Such an instruction transfers the flow of execution by changing the program counter.

What is JB x86?

IIRC, on x86 "JB" means "Jump if Borrow," which would occur if the carry flag is set as pointed out by Simon... – stix.


2 Answers

An l operand-size suffix implies an indirect jmp, unlike with calll main which is still a relative near-call. This inconsistency is pure insanity in AT&T syntax design.

(And since you're using it with an operand like main, it becomes a memory-indirect jump, doing a data load from main and using that as the new EIP value.)

You never need to use the jmpl mnemonic, you can and should indicate indirect jumps using * on the operand. Like jmp *%eax to set EIP = EAX, or jmp *4(%edi, %ecx, 4) to index a jump table, or jmp *func_pointer. Using jmpl is optional in all of these.

You could use jmpw *%ax to truncate EIP to a 16-bit value. That assembles to 66 ff e0 jmpw *%ax)


Compare What is callq instruction? and What is the difference between retq and ret?, that's just the operand-size suffix behaving like you expected it would, same as plain call or plain ret. But jmp is different.


semi-related: far jmp or call (to a new CS:[ER]IP) in AT&T syntax is ljmp / lcall. These are very different.


It's also insane that GAS accepts jmpl main as equivalent to jmpl *main. It only warns instead of erroring.

$ gcc -no-pie -fno-pie -m32 jmp.s 
jmp.s: Assembler messages:
jmp.s:3: Warning: indirect jmp without `*'

And then disassembling it to see what we got, with objdump -drwC a.out:

08049156 <main>:                                          # corresponding source line (added by hand)
 8049156:       ff 25 56 91 04 08       jmp    *0x8049156    # jmpl main
 804915c:       ff 25 56 91 04 08       jmp    *0x8049156    # jmp  *main
 8049162:       ff 25 56 91 04 08       jmp    *0x8049156    # jmpl *main

08049168 <foo>:
 8049168:       e8 fb ff ff ff          call   8049168 <foo> # calll foo
 804916d:       ff 15 68 91 04 08       call   *0x8049168    # calll *foo
 8049173:       ff 15 68 91 04 08       call   *0x8049168    # call  *foo

We get the same thing if we replace l with q in the source, and built without -m32 (using the default -m64). Including the same warning about a missing *. But the disassembly has an explicit jmpq and callq on every instruction. (Except for a relative direct jmp I added, which uses the jmp mnemonic in the disassembly.)

It's like objdump thinks 32-bit is the default operand-size for jmp/call in both 32 and 64-bit mode, so it wants to always use a q suffix in 64-bit, but leaves it implicit in 32-bit mode. Anyway, that's just disassembly choice between implicit / explicit size suffixes, no weirdness for a programmer writing source code.


Other AT&T-syntax assemblers:

  • Clang's built-in assembler does reject jmpl main, requiring jmpl *main.

    $ clang -m32 jmp.s
    jmp.s:3:8: error: invalid operand for instruction
      jmpl main
           ^~~~
    

    calll main is the same as call main. call *main and calll *main are both accepted for indirect jumps.

  • YASM's GAS-syntax mode assembles jmpl main to a near relative jmp, like jmp main! So it disagrees with gcc/clang about jmpl implying indirect. (Very few people use YASM in GAS mode; and these days its maintenance hasn't kept up with NASM for new instructions like AVX512. I like YASM's good defaults for long NOPs, but otherwise I'd recommend NASM.)

like image 66
Peter Cordes Avatar answered Oct 20 '22 06:10

Peter Cordes


You have fallen victim to the awfulness that is AT&T syntax.

x86 assembly design has instruction suffix, such as l(long), w(word), b(byte).

No, it doesn't. The abomination that is AT&T syntax has this.
In the sane Intel syntax there are no such suffixes.

Is jmpl something different.

Yes, this is an indirect jump to an absolute address. A -near- jump to a -long- address.
(ljmp in gnu syntax is a -far- jump, but that's totally different, setting a new CS:EIP.)
The default for a jump is a near jump, to a relative address.
Note that the Intel syntax for this jump is:

jmp dword [ds:0x0804839b]  //note the [] specifying the indirectness.
//or, this is the same
jmp [0x0804839b]
//or
jmp [main]
//or
jmp DWORD PTR ds:0x804839f  //the PTR makes it indirect.

I prefer the [], to highlight the indirectness.

It does not jump to 0x0804839b, but reads a dword from the specified address and then jumps to the address specified in this dword. In the Intel syntax the indirectness is explicit.

Of course you intended to jump to 0x0804839b (aka main:) directly, which is done by:

Hm, most assembler do not allow absolute far jumps!  
It cannot be done.

See also: How to code a far absolute JMP/CALL instruction in MASM?

A near/short relative jump is (almost) always better, because it will still be valid when your code changes; the long jump can become invalid. Also shorter instructions are usually better, because they occupy less space in the instruction cache. The assembler (in Intel mode) will automatically select the correct jmp encoding for you.

SPARC
This is a totally different processor than the x86. From a different manufacturer, using a different paradigm. Obviously the SPARC documentation bears no relation to the x86 docs.

The official Intel documentation for jmp is here.

https://www.felixcloutier.com/x86/jmp

Note that Intel does not specify different mnemonics for the relative and absolute forms of the jmp. This is because Intel want to assembler to always use the short (relative) jump, unless the target is too far away, in which case the near jmp rel32 encoding is used. (Or in 16-bit mode, jmp foo could assemble to a far absolute jump to a different CS value, aka segment. In 32-bit mode, a relative jmp rel32 can reach any other EIP value from anywhere.)
The beauty of this is that the assembler automatically uses the proper jump for you.
(In 64-bit mode jumping more than +-2GiB requires extra instructions or a pointer in memory, there is no 64-bit absolute direct far jump, so the assembler can't do this for you automatically.))

Forcing gnu back to sanity
You can use

 .intel_syntax noprefix    <<-- as the first line in your assembly
 mov eax,[eax+100+ebx*2] 
 ....

To make gnu use Intel syntax, this will put things back the way they are designed by Intel and away from the PDP7 syntax used by gnu.

like image 25
Johan Avatar answered Oct 20 '22 05:10

Johan