Handling out of order execution

Tags:

I recently stumbled upon this Wikipedia article. From my experience with multi-threading I am aware of the multitude of issues caused by the program being able to switch threads between threads at any time. However, I never knew that compiler and hardware optimisations could reorder operations in a way that is guaranteed to work for a single thread, but not necessarily for multi-threading. Can anyone explain how to correctly deal with the possibility of reordered operations in a multi-threaded environment?

UPDATE: I originally had accidentally linked to the Out-of-Order Execution article instead of the Memory barrier article, which has a better explanation of the problem.

889

asked Jan 28 '10 23:01

Casebash

2 Answers

I will address your question as one about multithreading in a high-level language, rather than discussing CPU pipeline optimization.

Can anyone explain how to correctly deal with the possibility of reordered operations in a multi-threaded environment?

Most, if not all, modern high-level multithreaded languages provide constructs for managing this potential for the compiler to reorder the logical execution of instructions. In C#, these include field-level constructs (volatile modifier), block-level constructs (lock keyword), and imperative constructs (Thead.MemoryBarrier).

Applying volatile to a field causes all access to that field in the CPU/memory to be executed in the same relative order in which it occurs in the instruction sequence (source code).

Using lock around a block of code causes the enclosed instruction sequence to be executed in the same relative order in which it occurs in the parent block of code.

The Thread.MemoryBarrier method indicates to the compiler that the CPU must not reorder memory access around this point in the instruction sequence. This enables a more advanced technique for specialized requirements.

The techniques above are described in order of increasing complexity and performance. As with all concurrency programming, determining when and where to apply these techniques is the challenge. When synchronizing access to a single field, the volatile keyword will work, but it could prove to be overkill. Sometimes you only need to synchronize writes (in which case a ReaderWriterLockSlim would accomplish the same thing with much better performance). Sometimes you need to manipulate the field multiple times in quick succession, or you must check a field and conditionally manipulate it. In these cases, the lock keyword is a better idea. Sometimes you have multiple threads manipulating shared state in a very loosely-synchronized model to improve performance (not typically recommended). In that case, carefully placed memory barriers can prevent stale and inconsistent data from being used in threads.

119

answered Sep 28 '22 11:09

G-Wiz

Let me ask a question: Given a program code (say it is a single-threaded application), what is the correct execution? Intuitively, executing by CPU in-order as code specifies would be correct. This illusion of sequential execution is what programmers have.

However, modern CPU doesn't obey such restriction. Unless dependences are violated (data dependence, control dependence, and memory dependence), CPUs are executing instructions in out-of-order fashion. However, it is completely hidden to programmers. Programmers can never see what is going on inside of the CPU.

Compilers also exploit such fact. If the program's semantics (i.e., the inherent dependencies in your code) can be preserved, compilers would reorder any possible instruction to achieve better performance. One notable optimization is code hoisting: compilers may hoist load instruction to minimize memory latency. But, don't worry, compilers guarantee its correctness; In any case, compilers will NOT crash your program due to such instructing reordering since compilers must preserve dependencies at least. (But, compilers might have bugs :-)

If you're only considering single-thread application, you do not need to worry about such out-of-order execution either by compilers and CPUs, for a single-thread case.

(To learn more, I recommend you to take a look at the concept of ILP(instruction-level parallelism). Single thread performance is mostly dependent on how much ILP you can extract from a single thread. So, both CPUs and compilers do whatever they can do for better performance.)

However, when you consider multithreaded execution, then it has a potential problem called memory consistency problem. Intuitively programmers have a concept of sequential consistency. However, modern multi-core architectures are doing dirty and aggressive optimizations (e.g., caches and buffers). It is hard to implement sequential consistency with low-overheads in modern computer architecture. So, there could be very confusing situation due to out-of-order executions of memory loads and stores. You may observe some loads and stores had been executed in out-of-order. Read some articles related to relaxed memory models such as Intel x86's memory model (Read Chapter 8, Memory Ordering, of Volume 3A of Intel 64 and IA-32 Architectures Software Developer’s Manual). Memory barriers are needed in this situation where you have to enforce orders of memory instructions for the correctness.

THE ANSWER TO THE QUESTION: It's not easy to answer for this question in short. There is no good tools that detects such out-of-order and problematic behaviors due to memory consistency model (though there are research papers). So, in short, it is even hard for you to find such bugs exist in your code. However, I strongly suggest you to read articles on double-checked locking and its detailed paper. In a double-checked locking, due to relaxed memory consistency and compilers' reordering (note that compilers do not aware multi-threaded behaviors unless you explicitly specify with memory barriers), it may lead misbehavior.

In sum:

If you're only working on a single-threaded program, then you don't need to worry about out-of-order behaviors.
On multi-core, you may need to consider memory consistency problems. But, it's actually rare when you really need to worry about memory consistency issue. Mostly, data races, deadlocks, and atomicity violations kill your multi-threaded program.

answered Sep 28 '22 11:09

10 revs, 2 users 98%

Related questions
                            
                                how can i get the executable's current directory in py2exe?
                            
                                Emacs recursive project search
                            
                                manually define page numbers in latex?
                            
                                How to implement service layer in Zend Framework? [closed]
                            
                                javascript/json date literal
                            
                                How to check user password in ldap whith java with given LdapContext?
                            
                                Using HtmlAnchor or ASP.NET HyperLink for anchor tag that navigates in-page named anchor
                            
                                Determining image sizes for multiple android screen sizes/densities
                            
                                How do I specify a recovery model (full, simple or bulk-logged) when creating a database in T-SQL code?
                            
                                How does Wordpress link posts to categories in its database?
                            
                                In what circumstances can large pages produce a speedup?
                            
                                How to tell Java SAX Parser to ignore invalid character references?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With