I think C++ does not cover any sort of transaction memory yet, but still TSX can somehow fit using "as if rule" into something that is governed by C++ memory model. So, what happens on successful HLE operation, or successful RTM transaction? Saying "there is data race, but it is ok" is not much helpful, as it does not clarify what "ok" means. With HLE probably it can be seen as "previous operation happens before subsequent operation. As if the section was still guarded by the lock that was elided". What is with RTM? As there's no even an elided lock, only (potentially non-atomic) memory operations, which could be loads, stores, both, or no-op. What is synchronized with what? What happens before what?

Apparently before going into specs or asking SO I should have read thoroughly "overview" pages: Hardware Lock Elision Overview <blockquote> The hardware ensures program order of operations on the lock, even though the eliding processor did not perform external write operations to the lock. If the eliding processor itself reads the value of the lock in the critical section, it will appear as if the processor had acquired the lock (the read will return the non-elided value). This behavior makes an HLE execution functionally equivalent to an execution without the HLE prefixes. </blockquote> Restricted Transactional Memory Overview <blockquote> <h3>RTM Memory Ordering</h3> A successful RTM commit causes all memory operations in the RTM region to appear to execute atomically. A successfully committed RTM region consisting of an XBEGIN followed by an XEND, even with no memory operations in the RTM region, has the same ordering semantics as a LOCK prefixed instruction. The XBEGIN instruction does not have fencing semantics. However, if an RTM execution aborts, all memory updates from within the RTM region are discarded and never made visible to any other logical processor. </blockquote> To complete the answer: <code>LOCK</code> prefixed instructions map to C++ <code>std::memory_order::seq_cst</code>. This covers all successful RTM transactions (which are as if single <code>LOCK</code>-prefixed instruction). It also covers most of HLE cases. Specifically: <ul> <li> <code>LOCK</code> prefixed instructions are executed as if they are executed, this implies <code>seq_cst</code> too</li> <li>The same for <code>XACQUIRE XCHG</code> / <code>XRELEASE XCHG</code>, as if it is executed, this implies <code>seq_cst</code> too</li> <li>Finally, <code>XRELEASE MOV [mem], op</code> is as if <code>MOV [mem], op</code>, so it is just <code>release</code> (under usual implementation of C++ memory model where sequentially consistent store has memory fence, not load)</li> </ul> (The documentation links are for Intel compiler. However they document hardware behavior, so the information should be applicable to other compilers. The only variable that compiler might introduce is compile time reordering. I expect however that if compiler implements intrinsic, it also implements proper reordering prohibition, if still unsure, place compiler barriers. And with direct assembly should just mark assembly code accordingly)

How to use Intel TSX with C++ memory model?

Tags:

c++

language-lawyer

memory-model

intel-tsx

I think C++ does not cover any sort of transaction memory yet, but still TSX can somehow fit using "as if rule" into something that is governed by C++ memory model.

So, what happens on successful HLE operation, or successful RTM transaction?

Saying "there is data race, but it is ok" is not much helpful, as it does not clarify what "ok" means.

With HLE probably it can be seen as "previous operation happens before subsequent operation. As if the section was still guarded by the lock that was elided".

What is with RTM? As there's no even an elided lock, only (potentially non-atomic) memory operations, which could be loads, stores, both, or no-op. What is synchronized with what? What happens before what?

324

asked Apr 21 '20 04:04

Alex Guteniev

1 Answers

Apparently before going into specs or asking SO I should have read thoroughly "overview" pages:

Hardware Lock Elision Overview

The hardware ensures program order of operations on the lock, even though the eliding processor did not perform external write operations to the lock. If the eliding processor itself reads the value of the lock in the critical section, it will appear as if the processor had acquired the lock (the read will return the non-elided value). This behavior makes an HLE execution functionally equivalent to an execution without the HLE prefixes.

Restricted Transactional Memory Overview

RTM Memory Ordering

A successful RTM commit causes all memory operations in the RTM region to appear to execute atomically. A successfully committed RTM region consisting of an XBEGIN followed by an XEND, even with no memory operations in the RTM region, has the same ordering semantics as a LOCK prefixed instruction. The XBEGIN instruction does not have fencing semantics. However, if an RTM execution aborts, all memory updates from within the RTM region are discarded and never made visible to any other logical processor.

To complete the answer:

LOCK prefixed instructions map to C++ std::memory_order::seq_cst. This covers all successful RTM transactions (which are as if single LOCK-prefixed instruction). It also covers most of HLE cases. Specifically:

LOCK prefixed instructions are executed as if they are executed, this implies seq_cst too
The same for XACQUIRE XCHG / XRELEASE XCHG, as if it is executed, this implies seq_cst too
Finally, XRELEASE MOV [mem], op is as if MOV [mem], op, so it is just release (under usual implementation of C++ memory model where sequentially consistent store has memory fence, not load)

(The documentation links are for Intel compiler. However they document hardware behavior, so the information should be applicable to other compilers. The only variable that compiler might introduce is compile time reordering. I expect however that if compiler implements intrinsic, it also implements proper reordering prohibition, if still unsure, place compiler barriers. And with direct assembly should just mark assembly code accordingly)

166

answered Oct 19 '22 17:10

Alex Guteniev

Related questions
                            
                                Calling base class method in derived class without specifying base class name
                            
                                How can I access output to stdout from a UWP console application in Windows 10?
                            
                                Are operators faster than functions?
                            
                                VSCode not recognizing includes from includepath
                            
                                Why does the compiler not always optimize away local variables?
                            
                                Warning: function 'F' is not needed and will not be emitted
                            
                                c++ singleton implementation Meyer's vs call_once
                            
                                C++ compilation error: cannot convert from B to A, no constructor, or constructor overload ambiguity
                            
                                constexpr int and constexpr double in c++
                            
                                Is the "d" in "double x = 0.0d;" a g++ extension?
                            
                                GCC constexpr allows add but not bitwise-or with address
                            
                                Why Am I Unable to Override Virtual Function?
                            
                                c++ test to check if a function is implemented recursively [closed]
                            
                                std::accumulate with a reference?
                            
                                Strange behavior of noexcept specifier in C++14
                            
                                Why can't a destructor have reference qualifiers?
                            
                                Is there a way to atomically flush a binary semaphore in C++ on Linux?
                            
                                Omitting core dumps via C a #define (or other in code/compile time solutions)?
                            
                                enabling "libc++_shared.so" to be enabled in the OpenCV android application
                            
                                Does lvalue-to-rvalue conversion ever happen to class types?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to use Intel TSX with C++ memory model?

Tags:

c++

language-lawyer

memory-model

intel-tsx

Alex Guteniev

People also ask

1 Answers

RTM Memory Ordering

Alex Guteniev

Recent Activity

Donate For Us