Do the Linux glibc pthread functions on x86_64 act as fences for weakly-ordered memory accesses? (pthread_mutex_lock/unlock are the exact functions I'm interested in).
SSE2 provides some instructions with weak memory ordering (non-temporal stores such as movntps in particular). If you are using these instructions and want to guarantee that another thread/core sees an ordering, then I understand you need an explicit fence for this, e.g., a sfence instruction.
Normally you do expect the pthread API to act as a fence appropriately. However, I suspect normal C code on x86 will not generate weakly-ordered memory accesses, so I'm not confident that pthreads needs to act as a fence for weakly-ordered accesses.
Reading through the glibc pthread source code, a mutex is in the end implemented using "lock cmpxchgl", at least on the uncontended path. So I'm guessing that what I need to know is does that instruction act as a fence for SSE2 weakly-ordered accesses?
Non-temporal stores need sfence
instruction to be ordered properly.
However, the efficient user-level implementation of a simple mutex supposes that it is released by a simple write which does not imply write-buffers flush, in contrast to atomic read-modify-write operations like lock cmpxchg
which imply full memory fence.
So you have a situation when the unlock
has no effect of store-with-release
semantic applied for non-temporal stores. Thus, these SSE stores can be reordered after the unlock and after another thread acquires the mutex.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With