I have spent an extensive number of weeks doing multithreaded coding in C# 4.0. However, there is one question that remains unanswered for me. I understand that the volatile keyword prevents the compiler from storing variables in registers, thus avoiding inadvertently reading stale values. Writes are always volatile in .Net, so any documentation stating that it also avoids stales writes is redundant. I also know that the compiler optimization is somewhat "unpredictable". The following code will illustrate a stall due to a compiler optimization (when running the release compile outside of VS): <pre class="prettyprint"><code>class Test { public struct Data { public int _loop; } public static Data data; public static void Main() { data._loop = 1; Test test1 = new Test(); new Thread(() => { data._loop = 0; } ).Start(); do { if (data._loop != 1) { break; } //Thread.Yield(); } while (true); // will never terminate } } </code></pre> The code behaves as expected. However, if I uncomment out the //Thread.Yield(); line, then the loop will exit. Further, if I put a Sleep statement before the do loop, it will exit. I don't get it. Naturally, decorating _loop with volatile will also cause the loop to exit (in its shown pattern). My question is: What are the rules the complier follows in order to determine when to implicity perform a volatile read? And why can I still get the loop to exit with what I consider to be odd measures? EDIT IL for code as shown (stalls): <pre class="prettyprint"><code>L_0038: ldsflda valuetype ConsoleApplication1.Test/Data ConsoleApplication1.Test::data L_003d: ldfld int32 ConsoleApplication1.Test/Data::_loop L_0042: ldc.i4.1 L_0043: beq.s L_0038 L_0045: ret </code></pre> IL with Yield() (does not stall): <pre class="prettyprint"><code>L_0038: ldsflda valuetype ConsoleApplication1.Test/Data ConsoleApplication1.Test::data L_003d: ldfld int32 ConsoleApplication1.Test/Data::_loop L_0042: ldc.i4.1 L_0043: beq.s L_0046 L_0045: ret L_0046: call bool [mscorlib]System.Threading.Thread::Yield() L_004b: pop L_004c: br.s L_0038 </code></pre>

<blockquote> What are the rules the complier follows in order to determine when to implicity perform a volatile read? </blockquote> First, it is not just the compiler that moves instructions around. The big 3 actors in play that cause instruction reordering are: <ul> <li>Compiler (like C# or VB.NET)</li> <li>Runtime (like the CLR or Mono)</li> <li>Hardware (like x86 or ARM)</li> </ul> The rules at the hardware level are a little more cut and dry in that they are usually documented pretty well. But, at the runtime and compiler levels there are memory model specifications that provide constraints on how instructions can get reordered, but it is left up to the implementers to decide how aggressively they want to optimize the code and how closely they want to toe the line with respect to the memory model constraints. For example, the ECMA specification for the CLI provides fairly weak guarantees. But Microsoft decided to tighten those guarantees in the .NET Framework CLR. Other than a few blog posts I have not seen much formal documentation on the rules the CLR adheres to. Mono, of course, might use a different set of rules that may or may not bring it closer to the ECMA specification. And of course, there may be some liberty in changing the rules in future releases as long as the formal ECMA specification is still considered. With all of that said I have a few observations: <ul> <li>Compiling with the Release configuration is more likely to cause instruction reordering.</li> <li>Simpler methods are more likely to have their instructions reordered.</li> <li>Hoisting a read from inside a loop to outside of the loop is a typical type of reordering optimization.</li> </ul> <blockquote> And why can I still get the loop to exit with what I consider to be odd measures? </blockquote> It is because those "odd measures" are doing one of two things: <ul> <li>generating an implicit memory barrier</li> <li>circumventing the compiler's or runtime's ability to perform certain optimizations</li> </ul> For example, if the code inside a method gets too complex it may prevent the JIT compiler from performing certain optimizations that reorders instructions. You can think of it as sort of like how complex methods also do not get inlined. Also, things like <code>Thread.Yield</code> and <code>Thread.Sleep</code> create implicit memory barriers. I have started a list of such mechanisms here. I bet if you put a <code>Console.WriteLine</code> call in your code it would also cause the loop to exit. I have also seen the "non terminating loop" example behave differently in different versions of the .NET Framework. For example, I bet if you ran that code in 1.0 it would terminate. This is why using <code>Thread.Sleep</code> to simulate thread interleaving could actually mask a memory barrier problem. Update: After reading through some of your comments I think you may be confused as to what <code>Thread.MemoryBarrier</code> is actually doing. What it is does is it creates a full-fence barrier. What does that mean exactly? A full-fence barrier is the composition of two half-fences: an acquire-fence and a release-fence. I will define them now. <ul> <li>Acquire fence: A memory barrier in which other reads & writes are not allowed to move before the fence.</li> <li>Release fence: A memory barrier in which other reads & writes are not allowed to move after the fence.</li> </ul> So when you see a call to <code>Thread.MemoryBarrier</code> it will prevent all reads & writes from being moved either above or below the barrier. It will also emit whatever CPU specific instructions are required. If you look at the code for <code>Thread.VolatileRead</code> here is what you will see. <pre class="prettyprint"><code>public static int VolatileRead(ref int address) { int num = address; MemoryBarrier(); return num; } </code></pre> Now you may be wondering why the <code>MemoryBarrier</code> call is after the actual read. Your intuition may tell you that to get a "fresh" read of <code>address</code> you would need the call to <code>MemoryBarrier</code> to occur before that read. But, alas, your intuition is wrong! The specification says a volatile read should produce an acquire-fence barrier. And per the definition I gave you above that means the call to <code>MemoryBarrier</code> has to be after the read of <code>address</code> to prevent other reads and writes from being moved before it. You see volatile reads are not strictly about getting a "fresh" read. It is about preventing the movement of instructions. This is incredibly confusing; I know.

When to use volatile to counteract compiler optimizations in C#

Tags:

c#

compiler-optimization

multithreading

.net-4.0

I have spent an extensive number of weeks doing multithreaded coding in C# 4.0. However, there is one question that remains unanswered for me.

I understand that the volatile keyword prevents the compiler from storing variables in registers, thus avoiding inadvertently reading stale values. Writes are always volatile in .Net, so any documentation stating that it also avoids stales writes is redundant.

I also know that the compiler optimization is somewhat "unpredictable". The following code will illustrate a stall due to a compiler optimization (when running the release compile outside of VS):

class Test
{
    public struct Data
    {
        public int _loop;
    }

    public static Data data;

    public static void Main()
    {
        data._loop = 1;
        Test test1 = new Test();

        new Thread(() =>
        {
            data._loop = 0;
        }
        ).Start();

        do
        {
            if (data._loop != 1)
            {
                break;
            }

            //Thread.Yield();
        } while (true);

        // will never terminate
    }
}

The code behaves as expected. However, if I uncomment out the //Thread.Yield(); line, then the loop will exit.

Further, if I put a Sleep statement before the do loop, it will exit. I don't get it.

Naturally, decorating _loop with volatile will also cause the loop to exit (in its shown pattern).

My question is: What are the rules the complier follows in order to determine when to implicity perform a volatile read? And why can I still get the loop to exit with what I consider to be odd measures?

EDIT

IL for code as shown (stalls):

L_0038: ldsflda valuetype ConsoleApplication1.Test/Data ConsoleApplication1.Test::data
L_003d: ldfld int32 ConsoleApplication1.Test/Data::_loop
L_0042: ldc.i4.1 
L_0043: beq.s L_0038
L_0045: ret

IL with Yield() (does not stall):

L_0038: ldsflda valuetype ConsoleApplication1.Test/Data ConsoleApplication1.Test::data
L_003d: ldfld int32 ConsoleApplication1.Test/Data::_loop
L_0042: ldc.i4.1 
L_0043: beq.s L_0046
L_0045: ret 
L_0046: call bool [mscorlib]System.Threading.Thread::Yield()
L_004b: pop 
L_004c: br.s L_0038

471

asked Dec 07 '11 11:12

IamIC

Video Answer

1 Answers

What are the rules the complier follows in order to determine when to implicity perform a volatile read?

First, it is not just the compiler that moves instructions around. The big 3 actors in play that cause instruction reordering are:

Compiler (like C# or VB.NET)
Runtime (like the CLR or Mono)
Hardware (like x86 or ARM)

The rules at the hardware level are a little more cut and dry in that they are usually documented pretty well. But, at the runtime and compiler levels there are memory model specifications that provide constraints on how instructions can get reordered, but it is left up to the implementers to decide how aggressively they want to optimize the code and how closely they want to toe the line with respect to the memory model constraints.

For example, the ECMA specification for the CLI provides fairly weak guarantees. But Microsoft decided to tighten those guarantees in the .NET Framework CLR. Other than a few blog posts I have not seen much formal documentation on the rules the CLR adheres to. Mono, of course, might use a different set of rules that may or may not bring it closer to the ECMA specification. And of course, there may be some liberty in changing the rules in future releases as long as the formal ECMA specification is still considered.

With all of that said I have a few observations:

Compiling with the Release configuration is more likely to cause instruction reordering.
Simpler methods are more likely to have their instructions reordered.
Hoisting a read from inside a loop to outside of the loop is a typical type of reordering optimization.

And why can I still get the loop to exit with what I consider to be odd measures?

It is because those "odd measures" are doing one of two things:

generating an implicit memory barrier
circumventing the compiler's or runtime's ability to perform certain optimizations

For example, if the code inside a method gets too complex it may prevent the JIT compiler from performing certain optimizations that reorders instructions. You can think of it as sort of like how complex methods also do not get inlined.

Also, things like Thread.Yield and Thread.Sleep create implicit memory barriers. I have started a list of such mechanisms here. I bet if you put a Console.WriteLine call in your code it would also cause the loop to exit. I have also seen the "non terminating loop" example behave differently in different versions of the .NET Framework. For example, I bet if you ran that code in 1.0 it would terminate.

This is why using Thread.Sleep to simulate thread interleaving could actually mask a memory barrier problem.

Update:

After reading through some of your comments I think you may be confused as to what Thread.MemoryBarrier is actually doing. What it is does is it creates a full-fence barrier. What does that mean exactly? A full-fence barrier is the composition of two half-fences: an acquire-fence and a release-fence. I will define them now.

Acquire fence: A memory barrier in which other reads & writes are not allowed to move before the fence.
Release fence: A memory barrier in which other reads & writes are not allowed to move after the fence.

So when you see a call to Thread.MemoryBarrier it will prevent all reads & writes from being moved either above or below the barrier. It will also emit whatever CPU specific instructions are required.

If you look at the code for Thread.VolatileRead here is what you will see.

public static int VolatileRead(ref int address)
{
    int num = address;
    MemoryBarrier();
    return num;
}

Now you may be wondering why the MemoryBarrier call is after the actual read. Your intuition may tell you that to get a "fresh" read of address you would need the call to MemoryBarrier to occur before that read. But, alas, your intuition is wrong! The specification says a volatile read should produce an acquire-fence barrier. And per the definition I gave you above that means the call to MemoryBarrier has to be after the read of address to prevent other reads and writes from being moved before it. You see volatile reads are not strictly about getting a "fresh" read. It is about preventing the movement of instructions. This is incredibly confusing; I know.

answered Sep 29 '22 15:09

Brian Gideon

Related questions
                            
                                ASP.NET MVC: Problem setting the Authorize attribute Role from a variable, requires const
                            
                                Linq To SQL and Having
                            
                                How to Authenticate LDAP in .NET
                            
                                How to convert a simple .Net console project a into portable exe with Mono and mkbundle?
                            
                                C# Winform ProgressBar and BackgroundWorker
                            
                                WPF: How to prevent a control from stealing a key gesture?
                            
                                Can I pass a type object to a generic method? [duplicate]
                            
                                Why does AutoMapper have an IValueFormatter when it has a seemingly much more powerful ValueResolver?
                            
                                Memory allocation for const in C#
                            
                                How do I open a web browser from a .NET Program? Process.Start() isn't working?
                            
                                How to get attribute value using SelectSingleNode?
                            
                                C# - Launch Invisible Process (CreateNoWindow & WindowStyle not working?)
                            
                                Lambda expressions and how to combine them?
                            
                                C# Generic Type is boxed?
                            
                                How to define relationships programmatically in Entity Framework 4.1's Code-First Fluent API
                            
                                How to Use C++/CLI Within C# Application
                            
                                For C# logging, how do I obtain the call stack depth with minimal overhead?
                            
                                Trying to understand how to create fluent interfaces, and when to use them
                            
                                Use money type in Entity Framework model first
                            
                                Using RazorEngine to parse Razor templates concurrently

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With