Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does x64 assembler use 4-byte or 8-byte memory addresses?

I'm at the end of the chapter in the 2019 book "Beginning x64 Assembly Programming: From Novice to AVX Professional" by Jo Van Hoey...

Here is the excerpt (skip to the bold text for the problem):

     1                                  ; hello.asm
     2                                  section .data                                   
     3 00000000 68656C6C6F2C20776F-         msg db      "hello, world",0                       
     3 00000009 726C6400           
     4                                  section .bss                                                   
     5                                  section .text                                                   
     6                                      global main                                 
     7                                  main:
     8                                                                                   
     9 00000000 B801000000                      mov     rax, 1          ; 1 = write     
    10 00000005 BF01000000                  mov     rdi, 1              ; 1 = to stdout         
    11 0000000A 48BE-                       mov     rsi, msg    ; string to display in rsi
    11 0000000C [0000000000000000]
    12 00000014 BA0C000000                  mov     rdx, 12             ; length of the string, without 0
    13 00000019 0F05                        syscall                             ; display the string
    14 0000001B B83C000000                  mov     rax, 60             ; 60 = exit     
    15 00000020 BF00000000                  mov     rdi, 0              ; 0 = success exit code
    16 00000025 0F05                        syscall                             ; quit

Figure 1-4

Figure 1-4 shows our hello.lst. You have a column with the line numbers and then a column with eight digits. This column represents memory locations. When the assembler built the object file, it didn't know yet what memory locations would be used. So, it started at location 0 for the different sections. The section .bss part has no memory.

We see in the second column the result of the conversion of the assembly instruction into hexadecimal code. For example, mov rax is converted to B8 and mov rdi to BF. These are the hexadecimal representations of the machine instructions. Note also the conversion of the msg string to hexadecimal ASCII characters. Later you'll learn more about hexadecimal notation. The first instruction to be executed starts at address 00000000 [eight zeroes] and takes five bytes: B8 01 00 00 00. The double zeros are there for padding and memory alignment

...

The next instruction starts at address 00000005 [seven zeroes], and so on. The memory addresses have eight digits (that is, 8 bytes); each byte has 8 bits. So, the addresses have 64 bits; indeed, we are using a 64-bit assembler. Look at how msg is referenced. Because the memory location of msg is not known yet, it is referred to as [0000000000000000] [16 zeroes]

I am new to assembly, so my understanding is the second column in Figure 1-4 (for example 00000005) has eight digits, and since each digit is a number in hexadecimal, then each digit represents 4 bits, or a maximum value of 0xF, or 2^4=16.

This is hello.asm:

; hello.asm
section .data                                   
    msg db      "hello, world",0                       
section .bss                                                   
section .text                                                   
    global main                                 
main:

        mov     rax, 1          ; 1 = write     
    mov     rdi, 1              ; 1 = to stdout         
    mov     rsi, msg    ; string to display in rsi
    mov     rdx, 12             ; length of the string, without 0
    syscall                             ; display the string
    mov     rax, 60             ; 60 = exit     
    mov     rdi, 0              ; 0 = success exit code
    syscall                             ; quit

And, this is makefile:

#makefile for hello.asm
hello: hello.o
        gcc -o hello hello.o -no-pie
hello.o: hello.asm
        nasm -f elf64 -g -F dwarf hello.asm -l hello.lst

Can someone please help me understand the second column in Figure 1-4, or the memory addresses created by the assembler?

like image 859
Hugh Myron Avatar asked Mar 04 '26 23:03

Hugh Myron


1 Answers

That's a NASM listing, like you'd get on the terminal from nasm -l /dev/stdout -f elf64 hello.asm. The second column (first after line numbers) is offset relative to the start of the section. 8 hex digits is a 32-bit offset or address, the book is wrong about that. Each hex digit indeed only represents 4 bits, not 8 like the book seems to claim.

Since it's an offset starting from 0 (the same value you get from the $ special symbol in NASM source), there's no way for it to overflow a 32-bit number without you writing code like times 1024*1024*1024 dq 1,2,3,4 or something that will generate more than 4GiB of data in a section. That's possible, but you probably don't want to be looking at a text listing of it.

Look at how msg is referenced. Because the memory location of msg is not known yet, it is referred to as [0000000000000000] [16 zeroes]

That part is correct, unlike the previous sentence. mov r64, imm64 is the least efficient way to put an address into a register, but it's what you get from that asm source that does things the way you would in 16 or 32-bit mode.

And indeed, the linker will fill in that 64-bit absolute address. Use objdump -drwC -Mintel to disassemble the linked executable to see it. (GAS .intel_syntax noprefix is MASM-like, but the difference from NASM is mainly only in addressing modes. Agner Fog's objconv can disassemble into NASM syntax.)


Modern 64-bit code normally uses RIP-relative LEA, like lea rsi, [rel msg] which generates a 64-bit absolute address at run-time from machine code containing a 32-bit relative displacement from the program counter (instruction pointer). See How to load address of function or label into register for the three options, which for a -no-pie Linux executable also include mov esi, msg. Compilers will do that when you compile with -fno-pie (a code-gen option).

That will assemble to the same 32-bit operand-size mov r32, imm32 that NASM used for mov rax, 1 (which it optimized to mov eax, 1) Note the lack of a 0x4? REX prefix byte at the start of the other instructions. Why NASM on Linux changes registers in x86_64 assembly covers the 3 encodings of mov that can write a whole 64-bit register.

like image 137
Peter Cordes Avatar answered Mar 06 '26 11:03

Peter Cordes



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!