I'm at the end of the chapter in the 2019 book "Beginning x64 Assembly Programming: From Novice to AVX Professional" by Jo Van Hoey...
Here is the excerpt (skip to the bold text for the problem):
1 ; hello.asm
2 section .data
3 00000000 68656C6C6F2C20776F- msg db "hello, world",0
3 00000009 726C6400
4 section .bss
5 section .text
6 global main
7 main:
8
9 00000000 B801000000 mov rax, 1 ; 1 = write
10 00000005 BF01000000 mov rdi, 1 ; 1 = to stdout
11 0000000A 48BE- mov rsi, msg ; string to display in rsi
11 0000000C [0000000000000000]
12 00000014 BA0C000000 mov rdx, 12 ; length of the string, without 0
13 00000019 0F05 syscall ; display the string
14 0000001B B83C000000 mov rax, 60 ; 60 = exit
15 00000020 BF00000000 mov rdi, 0 ; 0 = success exit code
16 00000025 0F05 syscall ; quit
Figure 1-4
Figure 1-4 shows our
hello.lst. You have a column with the line numbers and then a column with eight digits. This column represents memory locations. When the assembler built the object file, it didn't know yet what memory locations would be used. So, it started at location 0 for the different sections. The section .bss part has no memory.We see in the second column the result of the conversion of the assembly instruction into hexadecimal code. For example,
mov raxis converted to B8 andmov rdito BF. These are the hexadecimal representations of the machine instructions. Note also the conversion of themsgstring to hexadecimal ASCII characters. Later you'll learn more about hexadecimal notation. The first instruction to be executed starts at address 00000000 [eight zeroes] and takes five bytes: B8 01 00 00 00. The double zeros are there for padding and memory alignment...
The next instruction starts at address 00000005 [seven zeroes], and so on. The memory addresses have eight digits (that is, 8 bytes); each byte has 8 bits. So, the addresses have 64 bits; indeed, we are using a 64-bit assembler. Look at how
msgis referenced. Because the memory location ofmsgis not known yet, it is referred to as [0000000000000000] [16 zeroes]
I am new to assembly, so my understanding is the second column in Figure 1-4 (for example 00000005) has eight digits, and since each digit is a number in hexadecimal, then each digit represents 4 bits, or a maximum value of 0xF, or 2^4=16.
This is hello.asm:
; hello.asm
section .data
msg db "hello, world",0
section .bss
section .text
global main
main:
mov rax, 1 ; 1 = write
mov rdi, 1 ; 1 = to stdout
mov rsi, msg ; string to display in rsi
mov rdx, 12 ; length of the string, without 0
syscall ; display the string
mov rax, 60 ; 60 = exit
mov rdi, 0 ; 0 = success exit code
syscall ; quit
And, this is makefile:
#makefile for hello.asm
hello: hello.o
gcc -o hello hello.o -no-pie
hello.o: hello.asm
nasm -f elf64 -g -F dwarf hello.asm -l hello.lst
Can someone please help me understand the second column in Figure 1-4, or the memory addresses created by the assembler?
That's a NASM listing, like you'd get on the terminal from nasm -l /dev/stdout -f elf64 hello.asm. The second column (first after line numbers) is offset relative to the start of the section. 8 hex digits is a 32-bit offset or address, the book is wrong about that. Each hex digit indeed only represents 4 bits, not 8 like the book seems to claim.
Since it's an offset starting from 0 (the same value you get from the $ special symbol in NASM source), there's no way for it to overflow a 32-bit number without you writing code like times 1024*1024*1024 dq 1,2,3,4 or something that will generate more than 4GiB of data in a section. That's possible, but you probably don't want to be looking at a text listing of it.
Look at how msg is referenced. Because the memory location of msg is not known yet, it is referred to as [0000000000000000] [16 zeroes]
That part is correct, unlike the previous sentence. mov r64, imm64 is the least efficient way to put an address into a register, but it's what you get from that asm source that does things the way you would in 16 or 32-bit mode.
And indeed, the linker will fill in that 64-bit absolute address. Use objdump -drwC -Mintel to disassemble the linked executable to see it. (GAS .intel_syntax noprefix is MASM-like, but the difference from NASM is mainly only in addressing modes. Agner Fog's objconv can disassemble into NASM syntax.)
Modern 64-bit code normally uses RIP-relative LEA, like lea rsi, [rel msg] which generates a 64-bit absolute address at run-time from machine code containing a 32-bit relative displacement from the program counter (instruction pointer). See How to load address of function or label into register for the three options, which for a -no-pie Linux executable also include mov esi, msg. Compilers will do that when you compile with -fno-pie (a code-gen option).
That will assemble to the same 32-bit operand-size mov r32, imm32 that NASM used for mov rax, 1 (which it optimized to mov eax, 1) Note the lack of a 0x4? REX prefix byte at the start of the other instructions. Why NASM on Linux changes registers in x86_64 assembly covers the 3 encodings of mov that can write a whole 64-bit register.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With