I've been reading some assembly code and I've started seeing that call instructions are actually program counter relative.
However, whenever I'm using visual studio or windbg to debug, it always says call 0xFFFFFF ... which to me means it's saying I'm going to jump to that address.
Who is right? Is Visual Studio hiding the complexity of the instruction encoding and just saying oh that's what the program means, that is the debugger know it's a pc-relative instruction, and since it knows the pc, it just goes and does the math for you?
Highly confused.
x86, being a complex instruction set, has many memory addressing modes. These are represented in my little made-up assembly notation inside brackets, like [eax] or [rbx 2 rax rcx]. In general, x86 instructions refer to memory locations by referencing up to three register values:
We will uses the standard AT&T syntax for writing x86 assembly code. The full x86 instruction set is large and complex (Intel's x86 instruction set manuals comprise over 2900 pages), and we do not cover it all in this guide.
Operands are entities operated upon by the instruction. Addresses are the locations in memory of specified data. An instruction is a statement that is executed at runtime. An x86 instruction statement can consist of four parts: See Statements for the description of labels and comments.
In assembly you’d either compute the value at compile time, or express it as a sequence of instructions. You can add two 32-bit registers in one instruction, but you can’t add three 32-bit registers – you’d need to break it up into two instructions.
If you're disassembling .o
object files that haven't been linked yet, the call address will just be a placeholder to be filled in by the linker.
You can use objdump -drwc -Mintel
to show the relocation types + symbol names from a .o
(The -r
option is the key. Or -R
for an already-linked shared library.)
It's more useful to the user to show the actual address of the jump target, rather than disassemble it as jcc eip-1234H
or something. Object files have a default load address, so the disassembler has a value for eip
at every instruction, and this is usually present in disassembly output.
e.g. in some asm code I wrote (where I use symbol names that made it into the object file, so the loop branch target is actually visible to the disassembler):
objdump -M intel -d rs-asmbench:
...
00000000004020a0 <.loop>:
4020a0: 0f b6 c2 movzx eax,dl
4020a3: 0f b6 de movzx ebx,dh
...
402166: 49 83 c3 10 add r11,0x10
40216a: 0f 85 30 ff ff ff jne 4020a0 <.loop>
0000000000402170 <.last8>:
402170: 0f b6 c2 movzx eax,dl
Note that the encoding of the jne
instruction is a signed little-endian 32bit displacement, of -0xD0
bytes. (jumps add their displacement to the value of e/rip
after the jump. The jump instruction itself is 6 bytes long, so the displacement has to be -0xD0
, not just -0xCA
.) 0x100 - 0xD0 = 0x30
, which is the value of the least-significant byte of the 2's complement displacement.
In your question, you're talking about the call addresses being 0xFFFF...
, which makes little sense unless that's just a placeholder, or you thought the non-0xFF
bytes in the displacement were part of the opcode.
Before linking, references to external symbols look like this:
objdump -M intel -d main.o
...
a5: 31 f6 xor esi,esi
a7: e8 00 00 00 00 call ac <main+0xac>
ac: 4c 63 e0 movsxd r12,eax
af: ba 00 00 00 00 mov edx,0x0
b4: 48 89 de mov rsi,rbx
b7: 44 89 f7 mov edi,r14d
ba: e8 00 00 00 00 call bf <main+0xbf>
bf: 83 f8 ff cmp eax,0xffffffff
c2: 75 cc jne 90 <main+0x90>
...
Notice how the call
instructions have their relative displacement = 0. So before the linker has slotted in the actual relative value, they encode a call
with a target of the instruction right after the call. (i.e. RIP = RIP+0
). The call bf
is immediately followed by an instruction that starts at 0xbf
from the start of the section. The other call
has a different target address because it's at a different place in the file. (gcc puts main
in its own section: .text.startup
).
So, if you want to make sense of what's actually being called, look at a linked executable, or get a disassembler that has looks at the object file symbols to slot in symbolic names for call targets instead of showing them as calls with zero displacement.
Relative jumps to local symbols already get resolved before linking:
objdump -Mintel -d asm-pinsrw.o:
0000000000000040 <.loop>:
40: 0f b6 c2 movzx eax,dl
43: 0f b6 de movzx ebx,dh
...
106: 49 83 c3 10 add r11,0x10
10a: 0f 85 30 ff ff ff jne 40 <.loop>
0000000000000110 <.last8>:
110: 0f b6 c2 movzx eax,dl
Note, the exact same instruction encoding on the relative jump to a symbol in the same file, even though the file has no base address, so the disassembler just treats it as zero.
See Intel's reference manual for instruction encoding. Links at https://stackoverflow.com/tags/x86/info. Even in 64bit mode, call
only supports 32bit sign-extended relative offsets. 64bit addresses are supported as absolute. (In 32bit mode, 16bit relative addresses are supported, with an operand-size prefix, I guess saving one instruction byte.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With