Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Implementing thread-local storage in custom libc

I'm implementing a small subset of libc for very small and statically linked programs, and I figured that adding TLS support would be a good learning experience. I use Ulrich Drepper's TLS document as a reference.

I have two strings set up to try this out:

static __thread const char msg1[] = "TLS (1).\n"; /* 10 bytes */
static __thread const char msg2[] = "TLS (2).\n"; /* 10 bytes */

And the compiler generates the following instructions to access them:

mov    rbx, QWORD PTR fs:0x0 ; Load TLS.
lea    rsi, [rbx-0x14]       ; Get a pointer to 'msg1'. 20 byte offset.
lea    rsi, [rbx-0xa]        ; Get a pointer to 'msg2'. 10 byte offset.

Let's assume I place the TCB somewhere on the stack:

struct tcb {
    void* self; /* Points to self. I read that this was necessary somewhere. */
    int errno;  /* Per-thread errno variable. */
    int padding;
};

And then place the TLS area just next to it at tls = &tcb - tls_size. Then I set the FS register to point at fs = tls + tls_size, and copy the TLS initialization image to tls.

However, this doesn't work. I have verified that I locate the TLS initialization image properly by writing the 20 bytes at tls_image to stdout. This either leads me to believe that I place the TCB and/or TLS area incorrectly, or that I'm otherwise not conforming to the ABI.

  • I set the FS register using arch_prctl(2). Do I need to use set_thread_area(2) somehow?
  • I don't have a dtv. I'm assuming this isn't necessary since I am linking statically.

Any ideas as to what I'm doing wrong? Thanks a lot!

like image 346
haste Avatar asked Jan 25 '13 21:01

haste


1 Answers

I'm implementing a small subset of libc for very small and statically linked programs, and I figured that adding TLS support would be a good learning experience.

Awesome idea! I had to implement my own TLS in a project because I could not use any common thread library like pthread. I do not have a completely solution for your problems, but sharing my experience could be useful.

I set the FS register using arch_prctl(2). Do I need to use set_thread_area(2) somehow?

The answer depends on the architecture, you are actually using. If you are using a x86-64 bit, you should use exclusively arch_prctl to set the FS register to an area of memory that you want to use as TLS (it allows you to address memory areas bigger than 4GB). While for x86-32 you must use set_thread_area as it is the only system call supported by the kernel.

The idea behind my implementation is to allocate a private memory area for each thread and save its address into the %GS register. It is a rather easy method, but in my case, it worked quite well. Each time you want to access the private area of a thread you just need to use as base address the value saved in %GS and an offset which identifies a memory location. I usually allocate a memory page (4096) for each thread and I divide it in 8 bytes blocks. So, I have 512 private memory slots for each thread, which can be accessed like an array whose indexes span from 0 to 511.

This is the code I use :

#define _GNU_SOURCE 1 

#include "tls.h"
#include <asm/ldt.h>
#include <stdlib.h>
#include <sys/mman.h>
#include <sys/prctl.h>
#include <asm/prctl.h>
#include <sys/syscall.h> 
#include <unistd.h> 

void * install_tls() {
  void *addr = mmap(0, 4096, PROT_READ|PROT_WRITE,
                       MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
  if (syscall(SYS_arch_prctl,ARCH_SET_GS, addr) < 0) 
      return NULL;

   return addr;
}

void freeTLS() {
    void *addr;
    syscall(SYS_arch_prctl,ARCH_GET_GS, &addr);  
    munmap(addr, 4096);
}

bool set_tls_value(int idx, unsigned long val) {
    if (idx < 0 || idx >= 4096/8) {
      return false;
    }
    asm volatile(
        "movq %0, %%gs:(%1)\n"
        :
        : "q"((void *)val), "q"(8ll * idx));
    return true;
}


unsigned long get_tls_value(int idx) {
    long long rc;
    if (idx < 0 || idx >= 4096/8) {
      return 0;
    }
    asm volatile(
        "movq %%gs:(%1), %0\n"
        : "=q"(rc)
        : "q"(8ll * idx));
    return rc;
  }

This is the header with some macros :

#ifndef TLS_H
#define TLS_H

#include <stdbool.h>

void *install_tls(); 
void freeTLS();
bool set_tls_value (int, unsigned long); 
unsigned long get_tls_value(int ); 

/*
 *macros used to set and retrieve the values 
 from the tls area
*/ 

#define TLS_TID 0x0
#define TLS_FD  0x8 
#define TLS_MONITORED 0x10

#define set_local_tid(_x) \
    set_tls_value(TLS_TID, (unsigned long)_x)

#define set_local_fd(_x) \
    set_tls_value(TLS_FD, (unsigned long)_x)

#define set_local_monitored(_x) \
    set_tls_value(TLS_MONITORED, (unsigned long)_x)

#define get_local_tid() \
    get_tls_value(TLS_TID)

#define get_local_fd() \
    get_tls_value(TLS_FD)

#define get_local_monitored() \
    get_tls_value(TLS_MONITORED)



#endif /* end of include guard: TLS_H */

The first action to be accomplished by each thread is to install the TLS memory area. Once the TLS are has been initialized, each thread can start using this area as private TLS.

like image 62
Giuseppe Pes Avatar answered Sep 21 '22 13:09

Giuseppe Pes