Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why use RIP-relative addressing in NASM?

I have an assembly hello world program for Mac OS X that looks like this:

global _main


section .text

_main:
    mov rax, 0x2000004
    mov rdi, 1
    lea rsi, [rel msg]
    mov rdx, msg.len
    syscall

    mov rax, 0x2000001
    mov rdi, 0
    syscall


section .data

msg:    db  "Hello, World!", 10
.len:   equ $ - msg

I was wondering about the line lea rsi, [rel msg]. Why does NASM force me to do that? As I understand it, msg is just a pointer to some data in the executable and doing mov rsi, msg would put that address into rsi. But if I replace the line lea rsi, [rel msg] with , NASM throws this error (note: I am using the command nasm -f macho64 hello.asm):

hello.asm:9: fatal: No section for index 2 offset 0 found

Why does this happen? What is so special about lea that mov can't do? How would I know when to use each one?

like image 550
Jerfov2 Avatar asked Jul 05 '15 19:07

Jerfov2


People also ask

What is RIP relative addressing?

RIP -relative addressing is a new form of effective addressing introduced with 64-bit long mode. The point is that it makes it easier to write position-independent code because you can make any memory reference RIP -relative. In fact, RIP -relative addressing is the default addressing mode in 64-bit applications.

What does RIP do in assembly language?

The instruction pointer register (%rip) points to the next instruction to execute; it cannot be directly accessed by the programmer, but is heavily used as the base for position-independent code addressing.

What is stored in the RIP register?

The %rip register on x86-64 is a special-purpose register that always holds the memory address of the next instruction to execute in the program's code segment.

What is rip in stack?

Finally, RIP is the instruction pointer. It holds the address of the instruction that the CPU just loaded and is presently executing. The diagram above shows a snapshot of the stack for a program that is presently in func1(), which was called from main().


1 Answers

What is so special about lea that mov can't do?

mov reg,imm loads an immediate constant into its destination operand. Immediate constant is encoded directly in the opcode, e.g. mov eax,someVar would be encoded as B8 EF CD AB 00 if address of someVar is 0x00ABCDEF. I.e. to encode such an instruction with imm being address of msg you need to know exact address of msg. In position-independent code you don't know it a priori.

mov reg,[expression] loads the value located at address described by expression. The complex encoding scheme of x86 instructions allows to have quite complex expression: in general it's reg1+reg2*s+displ, where s can be 0,1,2,4, reg1 and reg2 can be general-purpose registers or zero, and displ is immediate displacement. In 64-bit mode expression can have one more form: RIP+displ, i.e. the address is calculated relative to the next instruction.

lea reg,[expression] uses all this complex way of calculating addresses to load the address itself into reg (unlike mov, which dereferences the address calculated). Thus the information, unavailable at compilation time, namely absolute address which would be in RIP, can be encoded in the instruction without knowing its value. The nasm expression lea rsi,[rel msg] gets translated into something like

    lea rsi,[rip+(msg-nextInsn)]
nextInsn:

which uses the relative address msg-nextInsn instead of absolute address of msg, thus allowing the assembler to not know the actual address but still encode the instruction.

like image 117
Ruslan Avatar answered Sep 20 '22 13:09

Ruslan