Assembly

I've been reading some assembly code and I've started seeing that call instructions are actually program counter relative.

However, whenever I'm using visual studio or windbg to debug, it always says call 0xFFFFFF ... which to me means it's saying I'm going to jump to that address.

Who is right? Is Visual Studio hiding the complexity of the instruction encoding and just saying oh that's what the program means, that is the debugger know it's a pc-relative instruction, and since it knows the pc, it just goes and does the math for you?

Highly confused.

What are x86 instructions and how are they addressed?

x86, being a complex instruction set, has many memory addressing modes. These are represented in my little made-up assembly notation inside brackets, like [eax] or [rbx 2 rax rcx]. In general, x86 instructions refer to memory locations by referencing up to three register values:

What syntax do we use for x86 assembly code?

We will uses the standard AT&T syntax for writing x86 assembly code. The full x86 instruction set is large and complex (Intel's x86 instruction set manuals comprise over 2900 pages), and we do not cover it all in this guide.

What is the difference between address and operands in x86?

Operands are entities operated upon by the instruction. Addresses are the locations in memory of specified data. An instruction is a statement that is executed at runtime. An x86 instruction statement can consist of four parts: See Statements for the description of labels and comments.

How do you add a 32-bit register in assembly language?

In assembly you’d either compute the value at compile time, or express it as a sequence of instructions. You can add two 32-bit registers in one instruction, but you can’t add three 32-bit registers – you’d need to break it up into two instructions.

If you're disassembling .o object files that haven't been linked yet, the call address will just be a placeholder to be filled in by the linker.

You can use objdump -drwc -Mintel to show the relocation types + symbol names from a .o (The -r option is the key. Or -R for an already-linked shared library.)

It's more useful to the user to show the actual address of the jump target, rather than disassemble it as jcc eip-1234H or something. Object files have a default load address, so the disassembler has a value for eip at every instruction, and this is usually present in disassembly output.

e.g. in some asm code I wrote (where I use symbol names that made it into the object file, so the loop branch target is actually visible to the disassembler):

objdump -M intel  -d rs-asmbench:
...
00000000004020a0 <.loop>:
  4020a0:       0f b6 c2                movzx  eax,dl
  4020a3:       0f b6 de                movzx  ebx,dh
   ...
  402166:       49 83 c3 10             add    r11,0x10
  40216a:       0f 85 30 ff ff ff       jne    4020a0 <.loop>

0000000000402170 <.last8>:
  402170:       0f b6 c2                movzx  eax,dl

Note that the encoding of the jne instruction is a signed little-endian 32bit displacement, of -0xD0 bytes. (jumps add their displacement to the value of e/rip after the jump. The jump instruction itself is 6 bytes long, so the displacement has to be -0xD0, not just -0xCA.) 0x100 - 0xD0 = 0x30, which is the value of the least-significant byte of the 2's complement displacement.

In your question, you're talking about the call addresses being 0xFFFF..., which makes little sense unless that's just a placeholder, or you thought the non-0xFF bytes in the displacement were part of the opcode.

Before linking, references to external symbols look like this:

objdump -M intel -d main.o
  ...
  a5:   31 f6                   xor    esi,esi
  a7:   e8 00 00 00 00          call   ac <main+0xac>
  ac:   4c 63 e0                movsxd r12,eax
  af:   ba 00 00 00 00          mov    edx,0x0
  b4:   48 89 de                mov    rsi,rbx
  b7:   44 89 f7                mov    edi,r14d
  ba:   e8 00 00 00 00          call   bf <main+0xbf>
  bf:   83 f8 ff                cmp    eax,0xffffffff
  c2:   75 cc                   jne    90 <main+0x90>
  ...

Notice how the call instructions have their relative displacement = 0. So before the linker has slotted in the actual relative value, they encode a call with a target of the instruction right after the call. (i.e. RIP = RIP+0). The call bf is immediately followed by an instruction that starts at 0xbf from the start of the section. The other call has a different target address because it's at a different place in the file. (gcc puts main in its own section: .text.startup).

So, if you want to make sense of what's actually being called, look at a linked executable, or get a disassembler that has looks at the object file symbols to slot in symbolic names for call targets instead of showing them as calls with zero displacement.

Relative jumps to local symbols already get resolved before linking:

objdump -Mintel  -d asm-pinsrw.o:
0000000000000040 <.loop>:
  40:   0f b6 c2                movzx  eax,dl
  43:   0f b6 de                movzx  ebx,dh
  ...
 106:   49 83 c3 10             add    r11,0x10
 10a:   0f 85 30 ff ff ff       jne    40 <.loop>
0000000000000110 <.last8>:
 110:   0f b6 c2                movzx  eax,dl

Note, the exact same instruction encoding on the relative jump to a symbol in the same file, even though the file has no base address, so the disassembler just treats it as zero.

See Intel's reference manual for instruction encoding. Links at https://stackoverflow.com/tags/x86/info. Even in 64bit mode, call only supports 32bit sign-extended relative offsets. 64bit addresses are supported as absolute. (In 32bit mode, 16bit relative addresses are supported, with an operand-size prefix, I guess saving one instruction byte.)

Assembly - x86 call instruction and memory address?

Tags:

x86

linker

nasm

masm

halivingston

People also ask

1 Answers

Peter Cordes

Recent Activity

Donate For Us