This question led me to wonder about thread-local storage in high-level development frameworks like Java and .NET.
Java has a ThreadLocal<T>
class (and perhaps other constructs), while .NET has data slots, and soon a ThreadLocal<T>
class of its own. (It also has the ThreadStaticAttribute
, but I'm particularly interested in thread-local storage for member data.) Most other modern development environments provide one or more mechanisms for it, either at the language or framework level.
What problems does thread-local storage solve, or what advantages does thread-local storage provide over the standard object-oriented idiom of creating separate object instances to contain thread-local data? In other words, how is this:
// Thread local storage approach - start 200 threads using the same object
// Each thread creates a copy of any thread-local data
ThreadLocalInstance instance = new ThreadLocalInstance();
for(int i=0; i < 200; i++) {
ThreadStart threadStart = new ThreadStart(instance.DoSomething);
new Thread(threadStart).Start();
}
Superior to this?
// Normal oo approach, create 200 objects, start a new thread on each
for(int i=0; i < 200; i++) {
StandardInstance standardInstance = new StandardInstance();
ThreadStart threadStart = new ThreadStart(standardInstance.DoSomething);
new Thread(threadStart).Start();
}
I can see that using a single object with thread-local storage could be slightly more memory-efficient and require fewer processor resources due to fewer allocations (and constructions). Are there other advantages?
Thread Local Storage (TLS) is the mechanism by which each thread in a given multithreaded process allocates storage for thread-specific data. In standard multithreaded programs, data is shared among all threads of a given process, whereas thread local storage is the mechanism for allocating per-thread data.
In some ways, TLS is similar to static data. The only difference is that TLS data are unique to each thread. Most thread libraries-including Windows and Pthreads-provide some form of support for thread-local storage; Java provides support as well.
Thread-local storage ( TLS ) is a mechanism by which variables are allocated such that there is one instance of the variable per extant thread. The runtime model GCC uses to implement this originates in the IA-64 processor-specific ABI, but has since been migrated to other processors as well.
Thread locals in D are really fast. Here are my tests. Maybe compiler could be even more clever and cache thread local before loop to a register and return it to thread local at the end (it's interesting to compare with gdc compiler), but even now matters are very good IMHO.
What problems does thread-local storage solve, or what advantages does thread-local storage provide over the standard object-oriented idiom of creating separate object instances to contain thread-local data?
Thread local storage allows you to provide each running thread with a unique instance of a class, which is very valuable when trying to work with non-threadsafe classes, or when trying to avoid synchronization requirements that can occur due to shared state.
As for the advantage vs. your example - if you are spawning a single thread, there is little or no advantage to using thread local storage over passing in an instance. ThreadLocal<T>
and similar constructs become incredibly valuable, however, when working (directly or indirectly) with a ThreadPool.
For example, I have a specific process I worked on recently, where we are doing some very heavy computation using the new Task Parallel Library in .NET. Certain portions of the computations performed can be cached, and if the cache contains a specific match, we can shave off quite a bit of time when processing one element. However, the cached info had a high memory requirement, so we didn't want to cache more than the last processing step.
However, trying to share this cache across threads is problematic. In order to do so, we'd have to synchronize the access to it, and also add some extra checks inside of our class to make them thread safe.
Instead of doing this, I rewrote the algorithm to allow each thread to maintain its own private cache in a ThreadLocal<T>
. This allows the threads to each maintain their own, private cache. Since the partitioning scheme the TPL uses tends to keep blocks of elements together, each thread's local cache tended to contain the appropriate values it required.
This eliminated the synchronization issues, but also allowed us to keep our caching in place. The overall benefit was quite large, in this situation.
For a more concrete example, take a look at this blog post I wrote on aggregation using the TPL. Internally, the Parallel class uses a ThreadLocal<TLocal>
whenever you use the ForEach overload that keeps local state (and the Parallel.For<TLocal>
methods, too). This is how the local state is kept separate per thread to avoid locking.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With