This is a very strange problem which occurs only when the program is compiled with -fPIC
option.
Using gdb
I'm able to print thread local variables but stepping over them leads to crash.
thread.c
#include <pthread.h>
#include <stdlib.h>
#include <stdio.h>
#define MAX_NUMBER_OF_THREADS 2
struct mystruct {
int x;
int y;
};
__thread struct mystruct obj;
void* threadMain(void *args) {
obj.x = 1;
obj.y = 2;
printf("obj.x = %d\n", obj.x);
printf("obj.y = %d\n", obj.y);
return NULL;
}
int main(int argc, char *arg[]) {
pthread_t tid[MAX_NUMBER_OF_THREADS];
int i = 0;
for(i = 0; i < MAX_NUMBER_OF_THREADS; i++) {
pthread_create(&tid[i], NULL, threadMain, NULL);
}
for(i = 0; i < MAX_NUMBER_OF_THREADS; i++) {
pthread_join(tid[i], NULL);
}
return 0;
}
Compile it using the following: gcc -g -lpthread thread.c -o thread -fPIC
Then while debugging it: gdb ./thread
(gdb) b threadMain
Breakpoint 1 at 0x4006a5: file thread.c, line 15.
(gdb) r
Starting program: /junk/test/thread
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[New Thread 0x7ffff7fc7700 (LWP 31297)]
[Switching to Thread 0x7ffff7fc7700 (LWP 31297)]
Breakpoint 1, threadMain (args=0x0) at thread.c:15
15 obj.x = 1;
(gdb) p obj.x
$1 = 0
(gdb) n
Program received signal SIGSEGV, Segmentation fault.
threadMain (args=0x0) at thread.c:15
15 obj.x = 1;
Although, if I compile it without -fPIC
then this problem doesn't occur.
Before anybody asks me why am I using -fPIC
, this is just a reduced test case. We have a huge component which compiles into a so
file which then plugs into another component. Therefore, fPIC
is necessary.
There is no functional impact because of it, only that debugging is near impossible.
Platform Information: Linux 2.6.32-431.el6.x86_64 #1 SMP Sun Nov 10 22:19:54 EST 2013 x86_64 x86_64 x86_64 GNU/Linux
, Red Hat Enterprise Linux Server release 6.5 (Santiago)
Reproducible on the following as well
Linux 3.13.0-66-generic #108-Ubuntu SMP Wed Oct 7 15:20:27
GNU gdb (Ubuntu 7.7.1-0ubuntu5~14.04.2) 7.7.1
gcc (Ubuntu 4.8.4-2ubuntu1~14.04) 4.8.4
By default, GDB stops all threads when any breakpoint is hit, and resumes all threads when you issue any command (such as continue , next , step , finish , etc.) which requires that the inferior process (the one you are debugging) start to execute.
Just run a program with s few threads, run gdb and before running attach PROCESS_PID run strace in another console. You must see ptrace (PTRACE_ATTACH) for each thread. Show activity on this post. ptrace PTRACE_ATTACH sends SIGSTOP to the process which suspends the whole process i.e. all threads.
Start calc from within gdb using the run command. It will go into an infinite loop. Press Ctrl-C (like before) to stop your program.
gdb will stop your program at whatever line it has just executed. From here you can examine variables and move through your program. To specify other places where gdb should stop, see the section on breakpoints below.
The problem lies deep in the bowels of GAS, the GNU assembler, and how it generates DWARF debug information.
The compiler, GCC, has the responsibility of generating a specific sequence of instructions for a position-independent thread-local access, which is documented in the document ELF Handling for Thread-Local Storage, page 22, section 4.1.6: x86-64 General Dynamic TLS Model. This sequence is:
0x00 .byte 0x66
0x01 leaq x@tlsgd(%rip),%rdi
0x08 .word 0x6666
0x0a rex64
0x0b call __tls_get_addr@plt
, and is the way it is because the 16 bytes it occupies leave space for backend/assembler/linker optimizations. Indeed, your compiler generates the following assembler for threadMain()
:
threadMain:
.LFB2:
.file 1 "thread.c"
.loc 1 14 0
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
subq $16, %rsp
movq %rdi, -8(%rbp)
.loc 1 15 0
.byte 0x66
leaq obj@tlsgd(%rip), %rdi
.value 0x6666
rex64
call __tls_get_addr@PLT
movl $1, (%rax)
.loc 1 16 0
...
The assembler, GAS, then relaxes this code, which contains a function call (!), down to just two instructions. These are:
mov
having an fs:
-segment override, andlea
, in the final assembly. They occupy between themselves 16 bytes in total, demonstrating why the General Dynamic Model instruction sequence is designed to require 16 bytes.
(gdb) disas/r threadMain
Dump of assembler code for function threadMain:
0x00000000004007f0 <+0>: 55 push %rbp
0x00000000004007f1 <+1>: 48 89 e5 mov %rsp,%rbp
0x00000000004007f4 <+4>: 48 83 ec 10 sub $0x10,%rsp
0x00000000004007f8 <+8>: 48 89 7d f8 mov %rdi,-0x8(%rbp)
0x00000000004007fc <+12>: 64 48 8b 04 25 00 00 00 00 mov %fs:0x0,%rax
0x0000000000400805 <+21>: 48 8d 80 f8 ff ff ff lea -0x8(%rax),%rax
0x000000000040080c <+28>: c7 00 01 00 00 00 movl $0x1,(%rax)
So far, everything has been done correctly. The problem now begins as GAS generates DWARF debug information for your particular assembler code.
While parsing line-by-line in binutils-x.y.z/gas/read.c
, function void
read_a_source_file (char *name)
, GAS encounters .loc 1 15 0
, the statement that begins the next line, and runs the handler void dwarf2_directive_loc (int dummy ATTRIBUTE_UNUSED)
in dwarf2dbg.c
. Unfortunately, the handler does not unconditionally emit debug information for the current offset within the "fragment" (frag_now
) of machine code it is currently building. It could have done this by calling dwarf2_emit_insn(0)
, but the .loc
handler currently only does so if it sees multiple .loc
directives consecutively. Instead, in our case it continues on to the next line, leaving the debug information unemitted.
On the next line it sees the .byte 0x66
directive of the General Dynamic sequence. This is not, in and of itself, part of an instruction, despite representing the data16
instruction prefix in x86 assembly. GAS acts upon it with the handler cons_worker()
, and the fragment increases from 12 bytes to 13 in size.
On the next line it sees a true instruction, leaq
, which is parsed by calling the macro assemble_one()
that maps to void md_assemble (char *line)
in gas/config/tc-i386.c
. At the very end of that function, output_insn()
is called, which itself finally calls dwarf2_emit_insn(0)
and causes debug information to be emitted at last. A new Line Number Statement (LNS) is begun that claims that line 15 began at function-start-address plus previous fragment size, but since we passed over the .byte
statement before doing so, the fragment is 1 byte too large, and the computed offset for the first instruction of line 15 is therefore 1 byte off.
Some time later GAS relaxes the Global Dynamic Sequence to the final instruction sequence that starts with mov fs:0x0, %rax
. The code size and all offsets remain unchanged because both sequences of instructions are 16 bytes. The debug information is unchanged, and still wrong.
GDB, when it reads the Line Number Statements, is told that the prologue of threadMain()
, which is associated with the line 14 on which is found its signature, ends where line 15 begins. GDB dutifully plants a breakpoint at that location, but unfortunately it is 1 byte too far.
When run without a breakpoint, the program runs normally, and sees
64 48 8b 04 25 00 00 00 00 mov %fs:0x0,%rax
. Correctly placing the breakpoint would involve saving and replacing the first byte of an instruction with int3
(opcode 0xcc
), leaving
cc int3
48 8b 04 25 00 00 00 00 mov (0x0),%rax
. The normal step-over sequence would then involve restoring the first byte of the instruction, setting the program counter eip
to the address of that breakpoint, single-stepping, re-inserting the breakpoint, then continuing the program.
However, when GDB plants the breakpoint at the incorrect address 1 byte too far, the program sees instead
64 cc fs:int3
8b 04 25 00 00 00 00 <garbage>
which is a wierd but still valid breakpoint. That's why you didn't see SIGILL (illegal instruction).
Now, when GDB attempts to step over, it restores the instruction byte, sets the PC to the address of the breakpoint, and this is what it sees now:
64 fs: # CPU DOESN'T SEE THIS!
48 8b 04 25 00 00 00 00 mov (0x0),%rax # <- CPU EXECUTES STARTING HERE!
# BOOM! SEGFAULT!
Because GDB restarted execution one byte too far, the CPU does not decode the fs:
instruction prefix byte, and instead executes mov (0x0),%rax
with the default segment, which is ds:
(data). This immediately results in a read from address 0, the null pointer. The SIGSEGV promptly follows.
All due credits to Mark Plotnick for essentially nailing this.
The solution that was retained is to binary-patch cc1
, gcc
's actual C compiler, to emit data16
instead of .byte 0x66
. This results in GAS parsing the prefix and instruction combination as a single unit, yielding the correct offset in the debug information.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With