In short, when I have multiple db sections in my .data section, the compiled addresses/labels are off when compiled by NASM. In my testing they are off by 256 bytes in the resulting Mach-O binary.
Software I am using:
nasm NASM version 2.11.08, installed via Homebrew as required for x84_64 ASMgobjdump GNU objdump (GNU Binutils) 2.25.1, installed via Homebrewclang Apple LLVM version 6.1.0 (clang-602.0.53) (based on LLVM 3.6.0svn)Take for example the following "hello world" NASM assembly.
main.sglobal _main
section .text
_main:
mov rax, 0x2000004
mov rdi, 1
lea rsi, [rel msg]
mov rdx, len
syscall
mov rax, 0x2000001
mov rdi, 0
syscall
section .data
msg: db "Hello, world!", 10
len: equ $ - msg
Compiled and run with:
/usr/local/bin/nasm -f macho64 -o main.o main.s
clang -o main main.o
./main
This works great, and produces the following output:
Hello, world!
Now, to add another message, we just need to add another string to the data section, and another syscall. Simple enough.
main.sglobal _main
section .text
_main:
mov rax, 0x2000004
mov rdi, 1
lea rsi, [rel msga]
mov rdx, lena
syscall
mov rax, 0x2000004
mov rdi, 1
lea rsi, [rel msgb]
mov rdx, lenb
syscall
mov rax, 0x2000001
mov rdi, 0
syscall
section .data
msga: db "Hello, world!", 10
lena: equ $ - msga
msgb: db "Break things!", 10
lenb: equ $ - msgb
Compile and run, same as before, and we get:
Break things!
What?!? Shouldn't we be getting?:
Hello, world!
Break things!
Something clearly went wrong. Time to disassemble the resulting binary and see what we got.
$ gobjdump -d -M intel main
Produces the following for _main:
0000000100000f7c <_main>:
100000f7c:b8 04 00 00 02 mov eax,0x2000004
100000f81:bf 01 00 00 00 mov edi,0x1
100000f86:48 8d 35 73 01 00 00 lea rsi,[rip+0x173] # 100001100 <msgb+0xf2>
100000f8d:ba 0e 00 00 00 mov edx,0xe
100000f92:0f 05 syscall
100000f94:b8 04 00 00 02 mov eax,0x2000004
100000f99:bf 01 00 00 00 mov edi,0x1
100000f9e:48 8d 35 69 00 00 00 lea rsi,[rip+0x69] # 10000100e <msgb>
100000fa5:ba 0e 00 00 00 mov edx,0xe
100000faa:0f 05 syscall
100000fac:b8 01 00 00 02 mov eax,0x2000001
100000fb1:bf 00 00 00 00 mov edi,0x0
100000fb6:0f 05 syscall
From the comment # 100001100 <msgb+0xf2>, we can see that it is pointing not to the msga symbol, but to 0xf2 past msgb, or 100001100 (at this address there are null bytes, resulting in no output). Inspecting the binary in a hex editor, I find the actual msga string at offset 1000, or address 100001000. The means that the address in the compiled binary is now off by 0x100/256 bytes, simply because there is now a second db label. What?!?
As an experiment, I decided to try putting both of the db sections into separate ASM/object files, and linking all 3 together. Doing so works.
main.sglobal _main
extern _msga
extern _lena
extern _msgb
extern _lenb
section .text
_main:
mov rax, 0x2000004
mov rdi, 1
lea rsi, [rel _msga]
mov rdx, _lena
syscall
mov rax, 0x2000004
mov rdi, 1
lea rsi, [rel _msgb]
mov rdx, _lenb
syscall
mov rax, 0x2000001
mov rdi, 0
syscall
msga.sglobal _msga
global _lena
section .data
_msga: db "Hello, world!", 10
_lena: equ $ - _msga
msgb.sglobal _msgb
global _lenb
section .data
_msgb: db "Break things!", 10
_lenb: equ $ - _msgb
Compile and run with:
/usr/local/bin/nasm -f macho64 -o main.o main.s
/usr/local/bin/nasm -f macho64 -o msga.o msga.s
/usr/local/bin/nasm -f macho64 -o msgb.o msgb.s
clang -o main msga.o msgb.o main.o
./main
Results in:
Hello, world!
Break things!
While this does work, I find it hard to believe this is the best solution.
Surely there must be a way to have multiple db labels in one ASM file? Am I doing something wrong in the way I write the ASM? Is this a bug in NASM? Is this expected behavior somehow, in which case why? My workaround is extra work and clutter, so I would greatly appreciate any assistance in this.
Yes, it's a bug in Nasm-2.11.08. Nasm-2.11.06 should work. Nasm-2.11.09rc1 should work(?). Sorry 'bout that!
The related issue can be found here:
Bug 3392306 - Issue with relative addressing and data section
The current version of 2.11.08 available by Homebrew patches this issue with the following diff file:
https://raw.githubusercontent.com/Homebrew/patches/7a329c65e/nasm/nasm_outmac64.patch
From 4920a0324348716d6ab5106e65508496241dc7a2 Mon Sep 17 00:00:00 2001
From: Cyrill Gorcunov <[email protected]>
Date: Sat, 9 May 2015 18:07:47 +0300
Subject: [PATCH] output: outmac64 -- Fix the case when first hit matches the
symbol
In case if we're looking up for a symbol and it's first
one in symbol table we might endup with error because of
using GE here (78f477b35f) ending cycle with @nearest = NULL.
http://bugzilla.nasm.us/show_bug.cgi?id=3392306
Reprted-by: Benjamin Randazzo <[email protected]>
Signed-off-by: Cyrill Gorcunov <[email protected]>
---
output/outmac64.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/output/outmac64.c b/output/outmac64.c
index c07dcbc..1d30e64 100644
--- a/output/outmac64.c
+++ b/output/outmac64.c
@@ -304,7 +304,7 @@ static struct symbol *get_closest_section_symbol_by_offset(uint8_t fileindex, in
for (sym = syms; sym; sym = sym->next) {
if ((sym->sect != NO_SECT) && (sym->sect == fileindex)) {
- if ((int64_t)sym->value >= offset)
+ if ((int64_t)sym->value > offset)
break;
nearest = sym;
}
--
2.4.10.GIT
So if you are installing via Homebrew, this problem should now be resolved.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With