Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Destruction of static class members in Thread local storage

I'm writing a fast multi-thread program, and I want to avoid syncronization (the function which would need to be syncronized must be called something like 5,000,000 times per second, so even a mutex would be too heavy).

The scenario is: I have a single global instance of a class, and each thread can access it. In order to avoid syncronization, all the data inside the class is accessed read-only, except for a bunch of class members, which are then declared in TLS (with __thread or __declspec(thread)).

Unfortunately, in order to use the __thread interface offered by the compiler, the class members have to be static and without constructors/deconstructors. The classes I use of course have custom constructors, so I'm declaring, as class members, a pointer to that classes (something like static __thread MyClass* _object).

Then, the first time a thread calls a method from the global instance, I'll do something like "(if _object == NULL) object = new MyClass(...)".

My biggest problem is: is there a smart way to free this allocated memory? This global class is from a library, and it is used by many threads in the program, and each thread is created in a different way (i.e. each thread executes a different function) and I can't put a snipplet of code each time the thread is going to terminate. Thank you guys.

like image 455
Gianluca Avatar asked Jan 04 '11 09:01

Gianluca


People also ask

What is thread local storage in C++?

In C and C++, thread-local storage applies to static variables or to variables with external linkage only. Local (automatic) variables are usually created on the stack and therefore are specific to the thread that executes the code, but global and static variables are shared among all threads since they reside in the data or BSS segment.

Are static variables shared by all threads in a process?

However, the static and global variables are shared by all threads in the process. With thread local storage (TLS), you can provide unique data for each thread that the process can access using a global index. One thread allocates the index, which can be used by the other threads to retrieve the unique data associated with the index.

What is the difference between thread_local and thread_local static?

Any local variable declared thread_local is implicitly static if no other storage class is provided; in other words, at block scope thread_local is equivalent to thread_local static.

Can thread local objects be initialized dynamically in C++?

Although C++ generally allows for such dynamic initialization of objects with an expression that involves a reference to itself, this kind of initialization isn't permitted with thread local objects. For example: A sizeof expression that includes the object being initialized doesn't represent a reference to itself, and is enabled in both C and C++.


3 Answers

In C++11 this is easily achieved:

static thread_local struct TlsCleaner {
    ~TlsCleaner() {
        cleanup_tls();
    }
} tls_cleaner;

cleanup_tls() will execute on every thread termination (provided the thread is created using C++ API like std::thread).

But then, you could just as well cleanup TLS objects directly in their destructors (which will also promptly execute). For example: static thread_local std::unique_ptr<MyClass> pMyClass; will delete MyClass when a thread terminates.

Before C++11 you can use hacks like the GNU "linker sets" or MSVC "_tls_used" callback.

Or, starting from Windows 6 (Vista), FlsAlloc, which accepts a cleanup callback.

like image 113
rustyx Avatar answered Sep 29 '22 06:09

rustyx


TLS clean-up is usually done in DllMain when it is passed DLL_THREAD_DETACH.

If your code is all in an EXE and not a DLL then you could create a dummy DLL that the EXE loads which in turn calls back into the EXE on DLL_THREAD_DETACH. (I don't know of a better way to have EXE code run on thread termination.)

There are a couple of ways for the DLL to call back into the EXE: One is that EXEs can export functions just like DLLs, and the DLL code can use GetProcAddress on the EXE's module handle. An easier method is to give the DLL an init function which the EXE calls to explicitly pass a function pointer.

Note that what you can do within DllMain is limited (and unfortunately the limits are not properly documented), so you should minimize any work done this way. Don't run any complex destructors; just free memory using a direct kernel32.dll API like HeapAlloc and free the TLS slot.

Also note that you won't get a DLL_THREAD_ATTACH for threads that were already running when your DLL was loaded (but you will still get DLL_THREAD_DETACH if they exit while the DLL is loaded), and that you'll get (only) a DLL_PROCESS_DETACH when the final thread exits.

like image 32
Leo Davidson Avatar answered Sep 29 '22 04:09

Leo Davidson


If you just want a generic cleanup function you can still use boost thread_specific_ptr. You don't need to actually use the data stored there, but you can take advantage of the custom cleanup function. Just make that function something arbitrary and you can do whatever you want. Look at the pthread function pthread_key_create for a direct pthreads function call.


There is unfortunately no easy answer, at least not that I've come across yet. That is, there is no common way to have complex objects deleted at thread exit time. However, there's nothing stopping you from doing this on your own.

You will need to register your own handlers at thread exit time. With pthreads that would be pthread_cleanup_push. I don't know what it is on windows. This is of course not cross-platform. But, presumably you have full control of the starting of the thread and its entry-point. You could simply explicitly call a cleanup function just before returning from your thread. I know you mentioned you can't add this snippet, in which case you'll be left calling the OS specific function to add a cleanup routine.

Obviously creating cleanup functions for all objects allocated could be annoying. So instead you should create one more thread local variable: a list of destructors for objects. For each thread-specific variable you create you'll push a destructor onto this list. This list will have to be created on demand if you don't have a common thread entry point: have a global function to call which takes your destructor and creates the list as necessary, then adds the destructor.

Exactly what this destructor looks like depends heavily on your object hierarchy (you may have simple boost bind statements, shared_ptr's, a virtual destructor in a base class, or a combination thereof).

Your generic cleanup function can then walk through this list and perform all the destructors.

like image 38
edA-qa mort-ora-y Avatar answered Sep 29 '22 05:09

edA-qa mort-ora-y