Remote debugging a code running in Qemu with GDB, based on an os-dev tutorial.
My version is here. The problem only happens when remote-debugging code inside qemu, not when building a normal executable to run directly inside GDB under the normal OS.
Code looks something like this:
#define BUFSIZE 255
static char buf[BUFSIZE];
void foo() {
// Making sure it's all zero.
for (int i = 0; i < BUFSIZE; i++) buf[i] = 0;
// Setting first char:
buf[0] = 'a';
// >> insert breakpoint right after setting the char <<
// Prints 'a'.
printf("%s", buf);
}
If I place a breakpoint at the marked spot and print the buffer with p buf
I get random values from random places, seemingly from my code section. If I get the address by p &buf
I get something that does not look correct, for two things:
If I do a char* p_buf = buf
and I check the address with p p_buf
it gives me a totally different address, which is stable across executions (the other was not). Then I inspect that memory section with x /255b 0x____
and I can see the a
and then zeros (97 0 0 0 ... 0).
The next command (printf("%s", buf);
) does actually prints a
.
This leaves me believing it might be GDB not knowing the correct location if I only inspect the static variable.
Where should I start debugging this?
Details about the compile conditions:
-g -Wall -Wextra -pedantic -nostdlib -nostdinc -fno-builtin -fno-stack-protector -nostartfiles -nodefaultlibs -m32
Example output from GDB:
(gdb) p buf
$1 = "dfghjkl;'`\000\\zxcvbnm,./\000*\000 ", '\000' <repeats 198 times>...
(gdb) p p_buf
$2 = 0x40c0 <buf+224> "a"
(gdb) p &buf
$3 = (char (*)[255]) 0x3fe0 <buf>
(gdb) info address buf
Symbol "buf" is static storage at address 0x3fe0.
Update 2:
Disassembled a version of the code that shows the discrepancy:
; void foo
0x19f1 <foo> push %ebp
0x19f2 <foo+1> mov %esp,%ebp
0x19f4 <foo+3> sub $0x10,%esp
; char* p_buf = char_buf; --> `p &char_buf` is 0x4040 (incorrect) but `p p_buf` is 0x4100
0x19f7 <foo+6> movl $0x4100,-0x4(%ebp)
; void* p_p_buf = (void*)p_buf; --> `p p_p_buf` gives 0x4100
0x19fe <foo+13> mov -0x4(%ebp),%eax
0x1a01 <foo+16> mov %eax,-0x8(%ebp)
; void* p_char_buf = (void*)&char_buf; --> `p p_char_buf` gives 0x4100
0x1a04 <foo+19> movl $0x4100,-0xc(%ebp)
; char_buf[0] = 'a'; --> correct address
0x1a0b <foo+26> movb $0x61,0x4100
; char_buf[1] = 'b'; --> correct address (asking `p &char_buf` here is still incorrectly 0x4040)
0x1a12 <foo+33> movb $0x62,0x4101
; void foo return
0x1a19 <foo+40> nop
0x1a1a <foo+41> leave
0x1a1b <foo+42> ret
My Makefile
for building the project looks like:
C_SOURCES = $(wildcard kernel/*.c drivers/*.c)
C_HEADERS = $(wildcard kernel/*.h drivers/*.h)
OBJ = ${C_SOURCES:.c=.o kernel/interrupt_table.o}
CC = /home/itarato/code/os/i386elfgcc/bin/i386-elf-gcc
# GDB = /home/itarato/code/os/i386elfgcc/bin/i386-elf-gdb
GDB = /usr/bin/gdb
CFLAGS = -g -Wall -Wextra -ffreestanding -fno-exceptions -pedantic -fno-builtin -fno-stack-protector -nostartfiles -nodefaultlibs -m32
QEMU = qemu-system-i386
os-image.bin: boot/boot.bin kernel.bin
cat $^ > $@
kernel.bin: boot/kernel_entry.o ${OBJ}
i386-elf-ld -o $@ -Ttext 0x1000 $^ --oformat binary
kernel.elf: boot/kernel_entry.o ${OBJ}
i386-elf-ld -o $@ -Ttext 0x1000 $^
kernel.dis: kernel.bin
ndisasm -b 32 $< > $@
run: os-image.bin
${QEMU} -drive format=raw,media=disk,file=$<,index=0,if=floppy
debug: os-image.bin kernel.elf
${QEMU} -s -S -drive format=raw,media=disk,file=$<,index=0,if=floppy &
${GDB} -ex "target remote localhost:1234" -ex "symbol-file kernel.elf" -ex "tui enable" -ex "layout split" -ex "focus cmd"
%.o: %.c ${C_HEADERS}
${CC} ${CFLAGS} -c $< -o $@
%.o: %.asm
nasm $< -f elf -o $@
%.bin: %.asm
nasm $< -f bin -o $@
build: os-image.bin
echo Pass
clean:
rm -rf *.bin *.o *.dis *.elf
rm -rf kernel/*.o boot/*.bin boot/*.o
For me, this doesn't seem to happen:
Breakpoint 1, main () at test65.c:16
16 printf("%s", buf);
(gdb) p buf
$2 = "a", '\000' <repeats 253 times>
Where should I start debugging this?
It seems like there are two things that might go wrong:
I'm not sure what could cause this, but it is easy enough to verify. Check what address p &buf
gives you. Then compare it to what you get from p_buf
and also to what info address buf
shows you.
Note that due to address space layout randomization the address of static variables will change at the point when you start the process. So before run
command the address could be e.g. 0x4040
and then change to 0x555555558040
once the code is running:
(gdb) info address buf
Symbol "buf" is static storage at address 0x4040.
(gdb) run
....
Breakpoint 1, main () at test65.c:16
16 printf("%s", buf);
(gdb) p &buf
$1 = (char (*)[255]) 0x555555558040 <buf>
(gdb) info address buf
Symbol "buf" is static storage at address 0x555555558040.
It sounds like a typical debugging problem caused by compiler optimizations. For example, the compiler might move the setting of buf[0] = a
after the point where your breakpoint lands, though it must set it before printf()
gets called. You could try compiling with -O0
to see if it changes anything.
You can also check the disassembly with disas
command, to see what has executed up to that point:
(gdb) disas
Dump of assembler code for function main:
0x000055555555517b <+50>: movb $0x61,0x2ebe(%rip) # 0x555555558040 <buf>
=> 0x0000555555555182 <+57>: lea 0x2eb7(%rip),%rsi # 0x555555558040 <buf>
0x0000555555555189 <+64>: lea 0xe74(%rip),%rdi # 0x555555556004
0x0000555555555190 <+71>: mov $0x0,%eax
0x0000555555555195 <+76>: callq 0x555555555050 <printf@plt>
For me the breakpoint lands at the point right after movb
sets 0x61
(letter a
) to buf
.
If you use stepi
command until you are at callq printf
instruction, you can be sure you see the buffer exactly like printf
would see it.
This is an interesting problem. It comes down to the fact that the code generated by LD (linker) for the ELF executable kernel.elf
is different from that of the code generated by LD for kernel.bin
when using the --oformat binary
option. While one would expect these to be the same, they are not.
More simply put these Makefile
rules do not produce the same code as you might expect:
kernel.elf: boot/kernel_entry.o ${OBJ}
i386-elf-ld -o $@ -Ttext 0x1000 $^
and
kernel.bin: boot/kernel_entry.o ${OBJ}
i386-elf-ld -o $@ -Ttext 0x1000 $^ --oformat binary
It appears the difference is in how the linker is aligning the sections when used with and without --oformat binary
. The ELF file (and the symbols used for debugging) are seen to be in one place while the binary file that is actually running in QEMU had code and data generated at different offsets.
I hadn't ever observed this issue because I use my own linker scripts and I always generate the binary file from the ELF executable with OBJCOPY rather than using LD to link twice. OBJCOPY can take an ELF executable and convert it to a binary file. The Makefile
rules could be amended to look like:
kernel.bin: kernel.elf
i386-elf-objcopy -O binary $^ $@
kernel.elf: boot/kernel_entry.o ${OBJ}
i386-elf-ld -o $@ -Ttext 0x1000 $^
Doing it this way will ensure the binary file that is generated matches what was produced for the ELF executable.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With