Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Static Variables and Thread-Local Storage

Background:

I have discovered something of an interesting edge case relating to static memory initialization across multiple threads. Specifically, I am using Howard Hinnant's TZ library which has been working fine for the rest of my code across many different threads.

Now, I am developing a logging class which relies on yet another thread and condition variable. Unfortunately, when I attempt to format a chrono time_point using date::make_zoned(data::locate_zone("UTC"), tp) the library crashes. Upon digging through tz.cpp, I find that the time zone database returned internally is evaluating to NULL. This all comes from the following snippet:

tzdb_list&
get_tzdb_list()
{
    static tzdb_list tz_db = create_tzdb();
    return tz_db;
}

As can be seen, the database list is stored statically. With a few printf()s and some time with GDB I can see that the same db is returned for multiple calls from the main thread but returns NULL when called from my logger thread.

If, however, I change the declaration of tzdb_list to:

static thread_local tzdb_list tz_db = create_tzdb();

Everything works as expected. This is not surprising as thread_local will cause each thread to do the heavy-lifting of creating a standalone instance of tzdb_list. Obviously this is wasteful of memory and can easily cause problems later. As such, I really don't see this as a viable solution.

Questions:

  • What about the invocation of one thread versus another would cause static memory to behave differently? If anything, I would expect the opposite of what is happening (eg. for the threads to 'fight' over initialized memory; not have one receive a NULL pointer).

  • How is it possible for a returned static reference to have multiple different values in the first place (in my case, valid memory versus NULL)?

  • With thread_local built into the library I get wildly different memory locations on opposite ends of the addressable region; why? I suspect that this has to do with where thread memory is allocated versus the main process memory but do not know the exact details of thread allocation regions.

Reference:

My logging thread is created with:

outputThread = std::thread(Logger::outputHandler, &outputQueue);

And the actual output handler / invocation of the library (LogMessage is just a typedef for std::tuple):

void Logger::outputHandler(LogQueue *queue)
{
    LogMessage entry;
    std::stringstream ss;

    while (1)
    {
        queue->pop(entry);           // Blocks on a condition variable

        ss << date::make_zoned(date::locate_zone("UTC"), std::get<0>(entry))
           << ":" << levelId[std::get<1>(entry)
           << ":" << std::get<3>(entry) << std::endl;

        // Printing stuff

        ss.str("");
        ss.clear();
    }
}

Additional code and output samples available on request.


EDIT 1

This is definitely a problem in my code. When I strip everything out my logger works as expected. What is strange to me is that my test case in the full application is just two prints in main and a call to the logger before manually exiting. None of the rest of the app initialization is run but I am linking in all support libraries at that point (Microsoft CPP REST SDK, MySQL Connector for C++ and Howard's date library (static)).

It is easy for me to see how something could be stomping this memory but, even in the "full" case in my application, I don't know why the prints on the main thread would work but the next line calling into the logger would fail. If something were going sideways at init I would expect all calls to break.

I also noticed that if I make my logger static the problem goes away. Of course, this changes the memory layout so it doesn't rule out heap / stack smashing. What I do find interesting is that I can declare the logger globally or on the stack at the start of main() and both will segfault in the same way. If I declare the logger as static, however, both global and stack-based declaration work.

Still trying to create a minimal test case which reproduces this.

I am already linking with -lpthread; have been pretty much since the inception of this application.

OS is Fedora 27 x86_64 running on an Intel Xeon. Compiler:

$ g++ --version
g++ (GCC) 7.3.1 20180130 (Red Hat 7.3.1-2)
Copyright (C) 2017 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
like image 366
MysteryMoose Avatar asked Feb 26 '18 00:02

MysteryMoose


People also ask

What is difference between thread-local storage and static data?

In some ways, TLS is similar to static data. The only difference is that TLS data are unique to each thread. Most thread libraries-including Windows and Pthreads-provide some form of support for thread-local storage; Java provides support as well.

Are static variables ThreadLocal?

static final ThreadLocal variables are thread safe. static makes the ThreadLocal variable available across multiple classes for the respective thread only. it's a kind of Global variable decaration of the respective thread local variables across multiple classes.

Is static local variable thread-safe?

So yes, you're safe.

Where local static variables are stored?

The static variables are stored in the data segment of the memory. The data segment is a part of the virtual address space of a program. All the static variables that do not have an explicit initialization or are initialized to zero are stored in the uninitialized data segment( also known as the BSS segment).

Is thread local storage static or dynamic?

Thread local storage is static but it behaves quite differently from simple static storage. When you declare a variable static there is exactly one instance of the variable. The compiler/runtime system guarantees that it will be initialized for you sometime before you actually use it, without specifying exactly when (some details omitted here.)

What is thread local storage in Linux?

Thread Local Storage. All threads of a process share its virtual address space. The local variables of a function are unique to each thread that runs the function. However, the static and global variables are shared by all threads in the process. With thread local storage (TLS), you can provide unique data for each thread ...

Are C++11 thread_local variables automatically static?

Are C++11 thread_local variables automatically static? There is no choice, except for namespace-scope variables. Thread local storage is static but it behaves quite differently from simple static storage. When you declare a variable static there is exactly one instance of the variable.

What is thread local variable in Java?

In Java, thread-local variables are implemented by the ThreadLocal class object. ThreadLocal holds variable of type T, which is accessible via get/set methods. At least for Oracle/OpenJDK, this does not use native thread-local storage in spite of OS threads being used for other aspects of Java threading.


1 Answers

It appears that this problem was caused by a bug in tz.cpp which has since been fixed.

The bug was that there was a namespace scope variable whose initialization was not guaranteed in the proper order. This was fixed by turning that variable into a function-local-static to ensure the proper initialization order.

My apologies to all who might have been impacted by this bug. And my thanks to all those who have reported it.

like image 102
Howard Hinnant Avatar answered Oct 20 '22 19:10

Howard Hinnant