Interlocked.Exchange slower than Interlocked.CompareExchange?

Question

I came across some odd performance results when optimizing a program, which are shown in the following BenchmarkDotNet benchmark:

string _s, _y = "yo";

[Benchmark]
public void Exchange() => Interlocked.Exchange(ref _s, null);

[Benchmark]
public void CompareExchange() => Interlocked.CompareExchange(ref _s, _y, null);

The results are as follows:

BenchmarkDotNet=v0.10.10, OS=Windows 10 Redstone 3 [1709, Fall Creators Update] (10.0.16299.192)
Processor=Intel Core i7-6700HQ CPU 2.60GHz (Skylake), ProcessorCount=8
Frequency=2531248 Hz, Resolution=395.0620 ns, Timer=TSC
.NET Core SDK=2.1.4
  [Host]     : .NET Core 2.0.5 (Framework 4.6.26020.03), 64bit RyuJIT
  DefaultJob : .NET Core 2.0.5 (Framework 4.6.26020.03), 64bit RyuJIT

          Method |      Mean |     Error |    StdDev |
---------------- |----------:|----------:|----------:|
        Exchange | 20.525 ns | 0.4357 ns | 0.4662 ns |
 CompareExchange |  7.017 ns | 0.1070 ns | 0.1001 ns |

It would seem that Interlocked.Exchange is more than twice as slow as Interlocked.CompareExchange - which is confusing because it's supposed to be doing less work. Unless I'm mistaken both are supposed be CPU ops.

Does anyone have a good explanation on why this could be happening? Is this an actual performance difference in the CPU ops or some issue in the way .NET Core is wrapping them?

If this is the situation it seem best to simply avoid Interlocked.Exchange() and use Interlocked.CompareExchange() whenever possible?

EDIT: Another odd thing: when I run the same benchmarks with int or long rather than string, I get more or less the same running time. Also, I used BenchmarkDotNet's disassembler diagnoser to look at the actually assembly being generated, and found something interesting: with the int/long version I can clearly see xchg and cmpxchg instructions, but with strings I see call into the Interlocked.Exchange/Interlocked.CompareExchange methods...!

EDIT2: Opened issue in coreclr: https://github.com/dotnet/coreclr/issues/16051

InBetween · Accepted Answer

Following up on my commentaries, this seems to be an issue with the generic overload of Exchange.

If you avoid the generic overload altogether (changing the type of _s and _y to object), the performance difference disappears.

The question remains though as to why resolving to the generic overloads only slows down Exchange. Reading through the Interlocked source code, it seems that a hack was implemented in CompareExchange<T> to make it faster. Source code commentaries on CompareExchange<T> follow:

 * CompareExchange<T>
 * 
 * Notice how CompareExchange<T>() uses the __makeref keyword
 * to create two TypedReferences before calling _CompareExchange().
 * This is horribly slow. Ideally we would like CompareExchange<T>()
 * to simply call CompareExchange(ref Object, Object, Object); 
 * however, this would require casting a "ref T" into a "ref Object", 
 * which is not legal in C#.
 * 
 * Thus we opted to cheat, and hacked to JIT so that when it reads
 * the method body for CompareExchange<T>() it gets back the
 * following IL:
 *
 *     ldarg.0 
 *     ldarg.1
 *     ldarg.2
 *     call System.Threading.Interlocked::CompareExchange(ref Object, Object, Object)
 *     ret
 *
 * See getILIntrinsicImplementationForInterlocked() in VM\JitInterface.cpp
 * for details.

Nothing similar is commented in Exchange<T> and it also makes use of the "horribly slow" __makeref so this could be the reason why you are seeing this unexpected behavior.

All this is of course my interpretation, you'd actually need someone of the .NET team to really confirm my suspicions.

Interlocked.Exchange<T> slower than Interlocked.CompareExchange<T>?

Tags:

c#

.net

.net-core

interlocked

compare-and-swap

Shay Rojansky

1 Answers

InBetween

Recent Activity

Donate For Us