Program compiled with -fPIC crashes while stepping over thread-local variable in GDB

Tags:

This is a very strange problem which occurs only when the program is compiled with -fPIC option.

Using gdb I'm able to print thread local variables but stepping over them leads to crash.

thread.c

#include <pthread.h>
#include <stdlib.h>
#include <stdio.h>

#define MAX_NUMBER_OF_THREADS 2

struct mystruct {
    int   x;
    int   y;
};

__thread struct mystruct obj;

void* threadMain(void *args) {
    obj.x = 1;
    obj.y = 2;

    printf("obj.x = %d\n", obj.x);
    printf("obj.y = %d\n", obj.y);

    return NULL;
}

int main(int argc, char *arg[]) {
    pthread_t tid[MAX_NUMBER_OF_THREADS];
    int i = 0;

    for(i = 0; i < MAX_NUMBER_OF_THREADS; i++) {
        pthread_create(&tid[i], NULL, threadMain, NULL);
    }

    for(i = 0; i < MAX_NUMBER_OF_THREADS; i++) {
        pthread_join(tid[i], NULL);
    }

    return 0;
}

Compile it using the following: gcc -g -lpthread thread.c -o thread -fPIC

Then while debugging it: gdb ./thread

(gdb) b threadMain 
Breakpoint 1 at 0x4006a5: file thread.c, line 15.
(gdb) r
Starting program: /junk/test/thread 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[New Thread 0x7ffff7fc7700 (LWP 31297)]
[Switching to Thread 0x7ffff7fc7700 (LWP 31297)]

Breakpoint 1, threadMain (args=0x0) at thread.c:15
15      obj.x = 1;
(gdb) p obj.x
$1 = 0
(gdb) n

Program received signal SIGSEGV, Segmentation fault.
threadMain (args=0x0) at thread.c:15
15      obj.x = 1;

Although, if I compile it without -fPIC then this problem doesn't occur.

Before anybody asks me why am I using -fPIC, this is just a reduced test case. We have a huge component which compiles into a so file which then plugs into another component. Therefore, fPIC is necessary.

There is no functional impact because of it, only that debugging is near impossible.

Platform Information: Linux 2.6.32-431.el6.x86_64 #1 SMP Sun Nov 10 22:19:54 EST 2013 x86_64 x86_64 x86_64 GNU/Linux, Red Hat Enterprise Linux Server release 6.5 (Santiago)

Reproducible on the following as well

Linux 3.13.0-66-generic #108-Ubuntu SMP Wed Oct 7 15:20:27 
GNU gdb (Ubuntu 7.7.1-0ubuntu5~14.04.2) 7.7.1
gcc (Ubuntu 4.8.4-2ubuntu1~14.04) 4.8.4

226

asked Oct 30 '15 06:10

Kartik Anand

1 Answers

The problem lies deep in the bowels of GAS, the GNU assembler, and how it generates DWARF debug information.

The compiler, GCC, has the responsibility of generating a specific sequence of instructions for a position-independent thread-local access, which is documented in the document ELF Handling for Thread-Local Storage, page 22, section 4.1.6: x86-64 General Dynamic TLS Model. This sequence is:

0x00 .byte 0x66
0x01 leaq  x@tlsgd(%rip),%rdi
0x08 .word 0x6666
0x0a rex64
0x0b call __tls_get_addr@plt

, and is the way it is because the 16 bytes it occupies leave space for backend/assembler/linker optimizations. Indeed, your compiler generates the following assembler for threadMain():

threadMain:
.LFB2:
        .file 1 "thread.c"
        .loc 1 14 0
        .cfi_startproc
        pushq   %rbp
        .cfi_def_cfa_offset 16
        .cfi_offset 6, -16
        movq    %rsp, %rbp
        .cfi_def_cfa_register 6
        subq    $16, %rsp
        movq    %rdi, -8(%rbp)
        .loc 1 15 0
        .byte   0x66
        leaq    obj@tlsgd(%rip), %rdi
        .value  0x6666
        rex64
        call    __tls_get_addr@PLT
        movl    $1, (%rax)
        .loc 1 16 0
        ...

The assembler, GAS, then relaxes this code, which contains a function call (!), down to just two instructions. These are:

a mov having an fs:-segment override, and
a lea

, in the final assembly. They occupy between themselves 16 bytes in total, demonstrating why the General Dynamic Model instruction sequence is designed to require 16 bytes.

(gdb) disas/r threadMain                                                                                                                                                                                         
Dump of assembler code for function threadMain:                                                                                                                                                                  
   0x00000000004007f0 <+0>:     55      push   %rbp                                                                                                                                                              
   0x00000000004007f1 <+1>:     48 89 e5        mov    %rsp,%rbp                                                                                                                                                 
   0x00000000004007f4 <+4>:     48 83 ec 10     sub    $0x10,%rsp                                                                                                                                                
   0x00000000004007f8 <+8>:     48 89 7d f8     mov    %rdi,-0x8(%rbp)                                                                                                                                           
   0x00000000004007fc <+12>:    64 48 8b 04 25 00 00 00 00      mov    %fs:0x0,%rax
   0x0000000000400805 <+21>:    48 8d 80 f8 ff ff ff    lea    -0x8(%rax),%rax
   0x000000000040080c <+28>:    c7 00 01 00 00 00       movl   $0x1,(%rax)

So far, everything has been done correctly. The problem now begins as GAS generates DWARF debug information for your particular assembler code.

While parsing line-by-line in binutils-x.y.z/gas/read.c, function void read_a_source_file (char *name), GAS encounters .loc 1 15 0, the statement that begins the next line, and runs the handler void dwarf2_directive_loc (int dummy ATTRIBUTE_UNUSED) in dwarf2dbg.c. Unfortunately, the handler does not unconditionally emit debug information for the current offset within the "fragment" (frag_now) of machine code it is currently building. It could have done this by calling dwarf2_emit_insn(0), but the .loc handler currently only does so if it sees multiple .loc directives consecutively. Instead, in our case it continues on to the next line, leaving the debug information unemitted.
On the next line it sees the .byte 0x66 directive of the General Dynamic sequence. This is not, in and of itself, part of an instruction, despite representing the data16 instruction prefix in x86 assembly. GAS acts upon it with the handler cons_worker(), and the fragment increases from 12 bytes to 13 in size.
On the next line it sees a true instruction, leaq, which is parsed by calling the macro assemble_one() that maps to void md_assemble (char *line) in gas/config/tc-i386.c. At the very end of that function, output_insn() is called, which itself finally calls dwarf2_emit_insn(0) and causes debug information to be emitted at last. A new Line Number Statement (LNS) is begun that claims that line 15 began at function-start-address plus previous fragment size, but since we passed over the .byte statement before doing so, the fragment is 1 byte too large, and the computed offset for the first instruction of line 15 is therefore 1 byte off.
Some time later GAS relaxes the Global Dynamic Sequence to the final instruction sequence that starts with mov fs:0x0, %rax. The code size and all offsets remain unchanged because both sequences of instructions are 16 bytes. The debug information is unchanged, and still wrong.

GDB, when it reads the Line Number Statements, is told that the prologue of threadMain(), which is associated with the line 14 on which is found its signature, ends where line 15 begins. GDB dutifully plants a breakpoint at that location, but unfortunately it is 1 byte too far.

When run without a breakpoint, the program runs normally, and sees

64 48 8b 04 25 00 00 00 00      mov    %fs:0x0,%rax

. Correctly placing the breakpoint would involve saving and replacing the first byte of an instruction with int3 (opcode 0xcc), leaving

cc                              int3
48 8b 04 25 00 00 00 00         mov    (0x0),%rax

. The normal step-over sequence would then involve restoring the first byte of the instruction, setting the program counter eip to the address of that breakpoint, single-stepping, re-inserting the breakpoint, then continuing the program.

However, when GDB plants the breakpoint at the incorrect address 1 byte too far, the program sees instead

64 cc                           fs:int3
8b 04 25 00 00 00 00            <garbage>

which is a wierd but still valid breakpoint. That's why you didn't see SIGILL (illegal instruction).

Now, when GDB attempts to step over, it restores the instruction byte, sets the PC to the address of the breakpoint, and this is what it sees now:

64                              fs:                # CPU DOESN'T SEE THIS!
48 8b 04 25 00 00 00 00         mov    (0x0),%rax  # <- CPU EXECUTES STARTING HERE!
# BOOM! SEGFAULT!

Because GDB restarted execution one byte too far, the CPU does not decode the fs: instruction prefix byte, and instead executes mov (0x0),%rax with the default segment, which is ds: (data). This immediately results in a read from address 0, the null pointer. The SIGSEGV promptly follows.

All due credits to Mark Plotnick for essentially nailing this.

The solution that was retained is to binary-patch cc1, gcc's actual C compiler, to emit data16 instead of .byte 0x66. This results in GAS parsing the prefix and instruction combination as a single unit, yielding the correct offset in the debug information.

173

answered Oct 15 '22 06:10

Iwillnotexist Idonotexist

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Program compiled with -fPIC crashes while stepping over thread-local variable in GDB

Tags:

c

linux

gcc

gdb

pthreads

Kartik Anand

People also ask

1 Answers

Iwillnotexist Idonotexist

Recent Activity

Donate For Us

Program compiled with -fPIC crashes while stepping over thread-local variable in GDB

Tags:

c

linux

gcc

gdb

pthreads

Kartik Anand

People also ask

1 Answers

Iwillnotexist Idonotexist

Related questions

Recent Activity

Donate For Us