Please have a look at the following code:
#include <pthread.h>
#include <boost/atomic.hpp>
class ReferenceCounted {
public:
ReferenceCounted() : ref_count_(1) {}
void reserve() {
ref_count_.fetch_add(1, boost::memory_order_relaxed);
}
void release() {
if (ref_count_.fetch_sub(1, boost::memory_order_release) == 1) {
boost::atomic_thread_fence(boost::memory_order_acquire);
delete this;
}
}
private:
boost::atomic<int> ref_count_;
};
void* Thread1(void* x) {
static_cast<ReferenceCounted*>(x)->release();
return NULL;
}
void* Thread2(void* x) {
static_cast<ReferenceCounted*>(x)->release();
return NULL;
}
int main() {
ReferenceCounted* obj = new ReferenceCounted();
obj->reserve(); // for Thread1
obj->reserve(); // for Thread2
obj->release(); // for the main()
pthread_t t[2];
pthread_create(&t[0], NULL, Thread1, obj);
pthread_create(&t[1], NULL, Thread2, obj);
pthread_join(t[0], NULL);
pthread_join(t[1], NULL);
}
This is somewhat similar to the Reference counting example from Boost.Atomic.
The main differences are that the embedded ref_count_
is initialized to 1
in the constructor (once the constructor is completed we have a single reference to the ReferenceCounted
object) and that the code doesn't use boost::intrusive_ptr
.
Please don't blame me for using delete this
in the code - this is the pattern that I have in a large code base at work and there's nothing I can do about it right now.
Now this code compiled with clang 3.5
from trunk (details below) and ThreadSanitizer (tsan v2) results in the following output from ThreadSanitizer:
WARNING: ThreadSanitizer: data race (pid=9871)
Write of size 1 at 0x7d040000f7f0 by thread T2:
#0 operator delete(void*) <null>:0 (a.out+0x00000004738b)
#1 ReferenceCounted::release() /home/A.Romanek/tmp/tsan/main.cpp:15 (a.out+0x0000000a2c06)
#2 Thread2(void*) /home/A.Romanek/tmp/tsan/main.cpp:29 (a.out+0x0000000a2833)
Previous atomic write of size 4 at 0x7d040000f7f0 by thread T1:
#0 __tsan_atomic32_fetch_sub <null>:0 (a.out+0x0000000896b6)
#1 boost::atomics::detail::base_atomic<int, int, 4u, true>::fetch_sub(int, boost::memory_order) volatile /home/A.Romanek/tmp/boost/boost_1_55_0/boost/atomic/detail/gcc-atomic.hpp:499 (a.out+0x0000000a3329)
#2 ReferenceCounted::release() /home/A.Romanek/tmp/tsan/main.cpp:13 (a.out+0x0000000a2a71)
#3 Thread1(void*) /home/A.Romanek/tmp/tsan/main.cpp:24 (a.out+0x0000000a27d3)
Location is heap block of size 4 at 0x7d040000f7f0 allocated by main thread:
#0 operator new(unsigned long) <null>:0 (a.out+0x000000046e1d)
#1 main /home/A.Romanek/tmp/tsan/main.cpp:34 (a.out+0x0000000a286f)
Thread T2 (tid=9874, running) created by main thread at:
#0 pthread_create <null>:0 (a.out+0x00000004a2d1)
#1 main /home/A.Romanek/tmp/tsan/main.cpp:40 (a.out+0x0000000a294e)
Thread T1 (tid=9873, finished) created by main thread at:
#0 pthread_create <null>:0 (a.out+0x00000004a2d1)
#1 main /home/A.Romanek/tmp/tsan/main.cpp:39 (a.out+0x0000000a2912)
SUMMARY: ThreadSanitizer: data race ??:0 operator delete(void*)
==================
ThreadSanitizer: reported 1 warnings
The strange thing is that thread T1
does a write of size 1 to the same memory location as thread T2
when doing atomic decrement on the reference counter.
How can the former write be explained? Is it some clean-up performed by the destructor of ReferenceCounted
class?
It is a false positive? Or is the code wrong?
My setup is:
$ uname -a
Linux aromanek-laptop 3.13.0-29-generic #53-Ubuntu SMP Wed Jun 4 21:00:20 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
$ clang --version
Ubuntu clang version 3.5-1ubuntu1 (trunk) (based on LLVM 3.5)
Target: x86_64-pc-linux-gnu
Thread model: posix
The code is compiled like this:
clang++ main.cpp -I/home/A.Romanek/tmp/boost/boost_1_55_0 -pthread -fsanitize=thread -O0 -g -ggdb3 -fPIE -pie -fPIC
Note that on my machine the implementation of boost::atomic<T>
resolves to __atomic_load_n
family of functions, which ThreadSanitizer claims to understand.
UPDATE 1: the same happens when using clang 3.4
final release.
UPDATE 2: the same problem occurs with -std=c++11
and <atomic>
with both libstdc++ and libc++.
This looks like a false positive.
The thread_fence
in the release()
method enforces all outstanding writes from fetch_sub
-calls to happen-before the fence returns. Therefore, the delete
on the next line cannot race with the previous writes from decreasing the refcount.
Quoting from the book C++ Concurrency in Action:
A release operation synchronizes-with a fence with an
order
ofstd::memory_order_acquire
[...] if that release operation stores a value that's read by an atomic operation prior to the fence on the same thread as the fence.
Since decreasing the refcount is a read-modify-write operation, this should apply here.
To elaborate, the order of operations that we need to ensure is as follows:
2.
and 3.
are synchronized implicitly, as they happen on the same thread. 1.
and 2.
are synchronized since they are both atomic read-modify-write operations on the same value. If these two could race the whole refcounting would be broken in the first place. So what is left is synchronizing 1.
and 3.
.
This is exactly what the fence does. The write from 1.
is a release
operation that is, as we just discussed, synchronized with 2.
, a read on the same value. 3.
, an acquire
fence on the same thread as 2.
, now synchronizes with the write from 1.
as guaranteed by the spec. This happens without requiring an addition acquire
write on the object (as was suggested by @KerrekSB in the comments), which would also work, but might be potentially less efficient due to the additional write.
Bottom line: Don't play around with memory orderings. Even experts get them wrong and their impact on performance is often negligible. So unless you have proven in a profiling run that they kill your performance and you absolutely positively have to optimize this, just pretend they don't exist and stick with the default memory_order_seq_cst
.
Just to highlight @adam-romanek's comment for others who stumble upon this, at the time of writing (March 2018) ThreadSanitizer does not support standalone memory fences. This is alluded to in the ThreadSanitizer FAQ which doesn't explicitly mention fences are supported:
Q: What synchronization primitives are supported? TSan supports pthread synchronization primitives, built-in compiler atomic operations (sync/atomic), C++ operations are supported with llvm libc++ (not very throughly [sic] tested, though).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With