Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C# Interlocked functions as a lock mechanism?

While I was reading about ReaderWriterLockSlim lock mechanism , There was this guy who suggested that Interlock Functions can be used for a finer locking

Also, I found here another answer from Marc :

...Writes make their change to a cloned copy, then use Interlocked.CompareExchange to swap the reference (re-applying their change if another thread mutated the reference in the interim).

Well , Currently all I know about Interlocked object , is that it is used (in a multithreaded environment) to do atomic additions, compare , compareExchange operations. (and I know how to use it)

But (and here is my question) —

Question

How can I use it as a lock ? ( sample code will be much appreciated)

edit

For simplicity , I'm pasting this code ( which is not thread safe - if Go was called by two threads simultaneously, it would be possible to get a division-by-zero error) :

class ThreadUnsafe
 {
   static int _val1 = 1, _val2 = 1;

   static void Go()
       {
        if (_val2 != 0) Console.WriteLine (_val1 / _val2);
        _val2 = 0;
       }
}

How can I use Interlock to replace the lock (which would have solved the problem) ?

like image 202
Royi Namir Avatar asked Mar 19 '23 12:03

Royi Namir


2 Answers

Interlocked is used to implement lockless algorithms and data structures. It's therefore not "finer locking", or even locking at all. It lets you do small and well-defined operations safely in a multi-threaded environment: for instance, if you want two threads to increment the same variable, you can use Interlocked to do it instead of acquiring a heavyweight lock and using the "regular increment".

However, there are many, many things that you can't do with Interlocked that you could do under a regular lock. For instance, anything that involves modifying more than one variable atomically generally can't be done with Interlocked, so you wouldn't be able to use it for your example.

Interlocked can, however, help developers implement locking mechanism, though you might as well use the built-in ones. Most locks require kernel support to interrupt the blocked thread until the lock becomes available. Therefore, the only kind of lock that you can implement with Interlocked alone is a spin lock: a lock that threads continually try to acquire until it works.

class InterlockedLock
{
    private int locked;

    public void Lock()
    {
        while (Interlocked.CompareExchange(ref locked, 1, 0) != 0)
            continue; // spin
    }

    public void Unlock()
    {
        locked = 0;
    }
}

In this example, locked is initially zero. When a thread tries to acquire it for the first time, locked becomes 1, and subsequent threads trying to Lock it will spin until someone calls Unlock to make locked 0 again.

like image 132
zneak Avatar answered Mar 22 '23 03:03

zneak


Interlocked.CompareExchange can be used to implement the equivalent of a SpinLock.

However it is perhaps better to lock at Interlocked as a way of avoiding locks entirely. As @zneak stated "Interlocked is used to implement lockless algorithms and data structures."

However I'm not sure that I agree with @zneak in the statement that "there are many, many things that you can't do with Interlocked". Not entirely true. Everything can be done with Interlocked. Its more a question of what is the point of using a mechanism built for avoiding locks in order to implement a lock... Just because it can be done does not mean it is a good idea. Kind of like trying to play soccer with a ping pong ball. Or using a formula 1 go kart to tow a ton of bricks. Sure you can do it, but... would you? really...

Note: In the code sample from @zneak a change is needed for it to be atomic (not read / write to L2 cache or some register etc...). In C# all writes are volatile by default but reads are not.

Either:

private int locked;

should be (to make the reads volatile)

private volatile int locked;

OR:

locked = 0;

should be (to force invalidation of any cached copies of this variable)

Interlocked.Exchange(ref locked, 0);

There are many valid use cases for using Interlocked.CompareExchange as a lock free mechanism. For example singleton logic in say trimming an object pool by spinning of a task are X number of events. When starting the task to trim the object pool we do not know if the previous instance has finished. In other words execute the block of code only of another thread is not already executing it.

Another scenario is where you might want a lock that is a system of gates. In other words if state is 1 then do blah, if 2 then do blah blah, if 3 then do blah blah blah.

Here are some examples:

A thread safe looping counter: https://github.com/tcwicks/ChillX/blob/master/src/ChillX.Core/Structures/ThreadsafeCounter.cs

A lock free thread safe queue: https://github.com/tcwicks/ChillX/blob/master/src/ChillX.Core/Structures/LockFreeQueue.cs

Note: It is important to realize is that in non trivial lock free implementations using Interlocked we will usually either spending a bit of cpu time spinning, or skipping tasks and coming back to them later, and/or compromising on some functions. For example it is "probably" impossible to implement a reliable threadsafe enumerator for the lock free queue above.

Note regarding using Interlocked, Monitor, ReaderWriterLock, ReaderWriterLockSlim etc... these will enforce a memory barrier which is quite expensive:

From: https://learn.microsoft.com/en-us/archive/msdn-magazine/2005/october/understanding-low-lock-techniques-in-multithreaded-apps

Unfortunately, interlocked operations are relatively expensive, on the order of several dozen to up to hundreds of times the cost of a normal instruction. The lower bound on the cost is related to the fact that interlocked operations guarantee that the update is atomic. This requires the processor to insure that no other processor is also trying to execute an interlocked operation on the same location at the same time. This round-trip communication with the memory system can take dozens of instruction cycles. The upper bound on the cost relates to the ordering guarantee these instructions provide. Interlocked instructions need to ensure that caches are synchronized so that reads and writes don't seem to move past the instruction. Depending on the details of the memory system and how much memory was recently modified on various processors, this can be pretty expensive (hundreds of instruction cycles).

Also bear in mind that depending on how long the lock is to be held a spinlock can provide better performance than the alternative mechanisms. However in these same scenarios Spinlock will outperform trying to do the same using Interlocked.CompareExchange. This is simply because the memory barrier is enforced when entering the Spinlock and optionally when we exit Spinlock.Exit(true). On the other hand (see code example below) with spinning on an Interlocked.CompareExchange we are enforcing a memory barrier multiple times again and again as we spin and once when we exit using Interlocked.Exchange.

From the same reference:

In a spin lock, exiting the lock does not require an interlocked operation because only the thread that owns the lock has access to the isEntered memory.

On true multiprocessor machines it is better to busy-wait rather than force a context switch with Sleep(0). Busy-waits should be done through something like System.Threading.SpinWait so that hyperthreaded processors are told that no useful work is being done and another logical CPU can use the hardware

Compare the performance of the lock free queue: https://github.com/tcwicks/ChillX/blob/master/src/TestApps/ChillX.MQServer.Benchmark/LockFreeQueue.cs

against its equivalent which uses locking: https://github.com/tcwicks/ChillX/blob/master/src/ChillX.Core/Structures/ThreadSafeQueue.cs

Above lock free queue is an extended version of Julian M Bucknall's Lock Free Queue. https://secondboyet.com/Articles/LockfreeQueue.html His standard implementation is faster however it creates an object allocation on every insert which needs to then be cleared by the Garbage collector. The end result under heavy use or benchmark is 60% time spent in GC. The extended version above uses memory pooled linked list nodes. However it still cannot keep up with the performance of its counterpart which uses locking. It comes close but considering that it has the additional overhead of linked list nodes which create GC pressure it is simply inferior since it cannot even match its counterparts performance.

This link below is a 99% lock free structure implemented from scratch without pooling but using a queue of ring buffers to minimize allocations. Its performance is twice as fast as the LockFreeQueue above. 367ms vs 607ms for 1000000 queue / dequeue operations across 8 threads. however it is still 50% slower than ThreadSafeQueue which uses ReaderWriterLockSlim.

https://github.com/tcwicks/ChillX/blob/master/src/ChillX.Core/Structures/LockFreeRingBufferQueue.cs

Since you asked for an example of using Interlocked as a lock. The following code demonstrates a SpinLock using Interlocked.CompareExchange instead of SpinLock. In this code example if syncronization does not work then then RunCounter will move to -1 or 2. If it works then RunCounter will stay within the values of 0 and 1.

private static volatile int StopSignal = 0;
private static int IsRunning = 0;
private static int RunCounter = 0;
private static object locker = new object();
private static void ContendIncDec()
{
    while (StopSignal == 0)
    {
        //equivalent of SpinLock.Enter().
        while (Interlocked.CompareExchange(ref IsRunning, 1, 0) != 0)
        {
            //And we are spinning. 
            //But why not SpinLock.Enter() instead cause it is a far better version the same thing...
            //Sure we could do a Thread.Sleep(0); but Monitor does a far better job by blocking the thread.
        }
        if (RunCounter == 0) { RunCounter++; }
        //Do more cool stuff here

        //equivalent of SpinLock.Exit().
        Interlocked.Exchange(ref IsRunning, 0);


        //equivalent of SpinLock.Enter().
        while (Interlocked.CompareExchange(ref IsRunning, 1, 0) != 0)
        {
            //And we are spinning. 
            //But why not SpinLock.Enter() instead cause it is a far better version the same thing...
            //Sure we could do a Thread.Sleep(0); but Monitor does a far better job  by blocking the thread once and resuming once.
        }
        if (RunCounter == 1) { RunCounter--; }
        //Do more cool stuff here

        //equivalent of SpinLock.Exit().
        Interlocked.Exchange(ref IsRunning, 0);
    }
}

static void Main(string[] args)
{
    List<Thread> testThreads = new List<Thread>();
    for (int I = 0; I < 10; I++)
    {
        Thread T = new Thread(new ThreadStart(ContendIncDec));
        testThreads.Add(T);
        T.Start();
    }
    Stopwatch swTest = new Stopwatch();
    swTest.Start();
    while (swTest.ElapsedMilliseconds < 5000)
    {
        Thread.Sleep(0);
    }
    Interlocked.Exchange(ref StopSignal, 1);
    foreach(Thread t in testThreads)
    {
        t.Join();
    }
    Console.WriteLine(@"FinalCount is: {0}  (Note: this should be either 1 or 0 if the locking structure works.)", RunCounter);

As a final note here are some benchmarks where the work done within the locked section is the smallest possible realistic work. In this case 4 threads are queueing 10,000 items each while another 4 threads are DeQueueing the 40,000 items. All 8 threads are competing for the same shared queue. Threads are started and then blocked using a ManualResetEvent so they all started doing their work simultaneously.

Using the Interlocked structure in the code sample above has the worst and most inconsistent performance of the lot with some runs taking 80ms and some taking 521 ms.

An interesting note is that in such a do minimal work scenario SpinLock.Enter() should have won over ReaderWriterLockSlim.EnterWriteLock().

However ReaderWriterLockSlim.EnterWriteLock() is almost 4 times faster.

BenchmarkDotNet=v0.13.1, OS=Windows 10.0.19044.1645 (21H2)
AMD Ryzen Threadripper 3970X, 1 CPU, 64 logical and 32 physical cores
.NET SDK=6.0.202
  [Host]   : .NET 6.0.4 (6.0.422.16404), X64 RyuJIT  [AttachedDebugger]
  .NET 6.0 : .NET 6.0.4 (6.0.422.16404), X64 RyuJIT

Job=.NET 6.0  Runtime=.NET 6.0

|                Method |           m_TestMode | numRepititions | numThreads |      Mean |     Error |     StdDev | Completed Work Items | Lock Contentions | Allocated |
|---------------------- |--------------------- |--------------- |----------- |----------:|----------:|-----------:|---------------------:|-----------------:|----------:|
| Bench_LockPerformance |     LockReaderWriter |          40000 |          4 | 166.47 ms | 15.845 ms |  46.719 ms |                    - |                - |  22,672 B |
| Bench_LockPerformance | LockReaderWriterSlim |          40000 |          4 |  19.93 ms |  1.047 ms |   3.053 ms |                    - |                - |      24 B |
| Bench_LockPerformance |          LockMonitor |          40000 |          4 |  21.73 ms |  0.896 ms |   2.643 ms |                    - |         111.4063 |       4 B |
| Bench_LockPerformance | SpinLock.Exit(false) |          40000 |          4 |  80.75 ms |  6.081 ms |  17.643 ms |                    - |                - |      96 B |
| Bench_LockPerformance |  SpinLock.Exit(true) |          40000 |          4 |  89.09 ms |  6.515 ms |  19.108 ms |                    - |                - |      60 B |
| Bench_LockPerformance |      LockInterlocked |          40000 |          4 | 321.08 ms | 54.203 ms | 159.818 ms |                    - |                - |      69 B |

Here are the benchmark results for LockFreeRingBufferQueue versus ThreadSafeQueue (ReaderWriterLockSlim) versus the standard ConcurrentQueue

OS=Windows 10.0.19044.1645 (21H2)
AMD Ryzen Threadripper 3970X, 1 CPU, 64 logical and 32 physical cores
.NET SDK=6.0.202
  [Host]   : .NET 6.0.4 (6.0.422.16404), X64 RyuJIT  [AttachedDebugger]
  .NET 6.0 : .NET 6.0.4 (6.0.422.16404), X64 RyuJIT

Job=.NET 6.0  Runtime=.NET 6.0

| Method                 | m_TestMode           | numRepititions | numThreads | Mean      | Error      | StdDev     | Lock Contentions  |     Gen 0  |     Gen 1  |    Gen 2  | Allocated  |
|----------------------- |--------------------- |--------------- |----------- |----------:| ----------:| ----------:| -----------------:| ----------:| ----------:| ---------:| ----------:|
| Bench_QueuePerformance | RingBufferQueue      | 1000000        | 1          | 80.31 ms  | 3.194 ms   | 8.742 ms   | -                 | 3400.0000  | 1400.0000  | -         | 28,566 KB  |
| Bench_QueuePerformance | ThreadSafeQueue      | 1000000        | 1          | 96.60 ms  | 1.910 ms   | 3.901 ms   | -                 | -          | -          | -         | 1 KB       |
| Bench_QueuePerformance | ConcurrentQueueCount | 1000000        | 1          | 70.94 ms  | 1.410 ms   | 2.750 ms   | -                 | 500.0000   | 500.0000   | 500.0000  | 2,948 KB   |
| Bench_QueuePerformance | ConcurrentQueueTry   | 1000000        | 1          | 49.13 ms  | 2.274 ms   | 6.704 ms   | 0.1250            | -          | -          | -         | 283 KB     |
| Bench_QueuePerformance | RingBufferQueue      | 1000000        | 4          | 367.08 ms | 10.954 ms  | 32.128 ms  | -                 | 3000.0000  | 1000.0000  | -         | 28,566 KB  |
| Bench_QueuePerformance | ThreadSafeQueue      | 1000000        | 4          | 253.59 ms | 12.344 ms  | 36.398 ms  | -                 | -          | -          | -         | 1 KB       |
| Bench_QueuePerformance | ConcurrentQueueCount | 1000000        | 4          | 341.09 ms | 12.249 ms  | 36.116 ms  | 18.5000           | -          | -          | -         | 1,540 KB   |
| Bench_QueuePerformance | ConcurrentQueueTry   | 1000000        | 4          | 266.15 ms | 13.365 ms  | 39.408 ms  | 11.6667           | 333.3333   | 333.3333   | 333.3333  | 2,223 KB   |

Benchmark implementation if you want to try it out is here: https://github.com/tcwicks/ChillX/blob/master/src/TestApps/ChillX.MQServer.Benchmark/Bench_Queue.cs

If you download the repository https://github.com/tcwicks/ChillX and run ChillX.MQServer.Benchmark console app - choose 2 from the menu to benchmark queues

like image 35
tcwicks Avatar answered Mar 22 '23 01:03

tcwicks