Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it okay to "double check" before and inside a lock before running the code inside?

On working with thread-safety, I find myself always "double checking" before executing code in a lock block and I wondered if I was doing the right thing. Consider the following three ways of doing the same thing:

Example 1:

private static SomeCollection MyCollection;
private static Object locker;
private void DoSomething(string key)
{
    if(MyCollection[key] == null)
    {
         lock(locker)
         {
              MyCollection[key] = DoSomethingExpensive(); 
         }
    }
    DoSomethingWithResult(MyCollection[key]);
}

Example 2:

private static SomeCollection MyCollection;
private static Object locker;
private void DoSomething(string key)
{
    lock(locker)
    {
         if(MyCollection[key] == null)
         {
              MyCollection[key] = DoSomethingExpensive(); 
         }
    }
    DoSomethingWithResult(MyCollection[key]);
}

Example 3:

private static SomeCollection MyCollection;
private static Object locker;
private void DoSomething(string key)
{
    if(MyCollection[key] == null)
    {
        lock(locker)
        {
             if(MyCollection[key] == null)
             {
                  MyCollection[key] = DoSomethingExpensive(); 
             }
        }
    }
    DoSomethingWithResult(MyCollection[key]);
}

I always lean towards Example 3, and here's why I think I'm doing the right thing

  • Thread 1 enters DoSomething(string)
  • MyCollection[key] == null so Thread 1 obtains a lock, just as Thread 2 enters
  • MyCollection[key] == null is still true, so Thread 2 waits to obtain the lock
  • Thread 1 calculates the value for MyCollection[key] and adds it to the collection
  • Thread 1 releases the lock and calls DoSomethingWithResult(MyCollection[key]);
  • Thread 2 obtains the lock, by which time MyCollection[key] != null
  • Thread 2 does nothing, releases the lock and continues on its merry way

Example 1 would work, but there is a big risk that Thread 2 could redundantly calculate MyCollection[key].

Example 2 would work, but every thread would obtain a lock, even if it didn't need to - which could be a (admittedly very small) bottleneck. Why hold up threads if you don't need to?

Am I overthinking this and if so, what is the preferred way of handling these situations?

like image 941
Iain Fraser Avatar asked Nov 06 '12 23:11

Iain Fraser


2 Answers

The first method should not be used. As you realised, it leaks, so more than one thread can end up running the expensive method. The longer that method takes, the bigger is the risk that another thread will also run it. In most cases it's only a performance problem, but in some cases it might also be a problem that the resulting data is later on replaced by a new set of data.

The second method is the most common way, the third method is used if the data is accessed so frequently that the locking becomes a performance issue.

like image 51
Guffa Avatar answered Sep 21 '22 11:09

Guffa


I'll introduce some sort of uncertainty, because the problem is not trivial. Basically I agree with Guffa and I'd choose second example. It's because the first is broken while third in turn, despite the fact is seems to be optimized, is tricky. That's why I'll focus on the third one here:

if (item == null)
{
    lock (_locker)
    {
        if (item == null)
            item = new Something();
    }
}

At first sight it may occur as improving performance without locking all the time, but there is also problematic, because of the memory model (reads may be reordered to come before writes), or aggressive compiler optimization (reference), for example:

  1. Thread A notices that the value item is not initialized, so it obtains the lock and begins to initialize the value.
  2. Due to memory model, compiler optimizations and so on, the code generated by the compiler is allowed to update the shared variable to point to a partially constructed object before A has finished performing the initialization.
  3. Thread B notices that the shared variable has been initialized (or so it appears), and returns its value. Because thread B believes the value is already initialized, it does not acquire the lock. If the variable is used before A finishes initializing it, the program will likely crash.

There are solutions for that problem:

  1. You can defined item as a volatile variable, which assures that reading variable will be always up to date. Volatile is used to create a memory barrier between reads and writes on the variable.

    (see The need for volatile modifier in double checked locking in .NET and Implementing the Singleton Pattern in C#)

  2. You can use MemoryBarrier (item non-volatile):

    if (item == null)
    {
        lock (_locker)
        {
            if (item == null)
            {
                var temp = new Something();
                // Insure all writes used to construct new value have been flushed.
                System.Threading.Thread.MemoryBarrier();                     
                item = temp;
            }
        }
    }
    

    The processor executing the current thread cannot reorder instructions in such a way that memory accesses prior to the call to MemoryBarrier execute after memory accesses that follow the call to MemoryBarrier.

    (see Thread.MemoryBarrier Method and this topic)

UPDATE: Double-Check Locking, if implemented correctly, seems to be working fine in C#. For more details check additional references e.g. MSDN, MSDN magazine and this answer.

like image 30
jwaliszko Avatar answered Sep 18 '22 11:09

jwaliszko