Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C# thread-safe getter performance differences

I am writting a thread safe object that basically represents a double and uses a lock to ensure safe reading and writing. I use many of these objects (20-30) in a piece of code that is reading and writing them all 100 times per second, and I am measuring the average computation time of each of these time steps. I started looking at a few options for implementations of my getter and after running many tests and collecting many samples to average out my measurement of computation time I find certain implementations perform consistently better than others, but not the implementations I would expect.

Implementation 1) Computation time average = 0.607ms:

protected override double GetValue()
{
    lock(_sync)
    {
        return _value;
    }
}

Implementation 2) Computation time average = 0.615ms:

protected override double GetValue()
{
    double result;
    lock(_sync)
    {
        result = _value;
    }
    return result;
}

Implementation 3) Computation time average = 0.560ms:

protected override double GetValue()
{
    double result = 0;
    lock(_sync)
    {
        result = _value;
    }
    return result;
}

What I expected: I had expected to see implementation 3 be the worst of the 3 (this was actually my original code, so it was chance or lazy coding that I had written it this way), but surprisingly it is consistently the best in terms of performance. I would expect implementation 1 to be the fastest. I also expected implementation 2 to be at least as fast, if not faster than implementation 3 since I am just removing an assignment to the double result that is overwritten anyways, so it is unnecessary.

My question is: can anyone explain why these 3 implementations have the relative performance that I have measured? It seems counter-intuitive to me and I would really like to know why.

I realize that these differences are not major, but their relative measure is consistent every time I run the test, collecting thousands of samples each test to average out the computation time. Also, please keep in mind I am doing these tests because my application requires very high performance, or at least as good as I can reasonably get it. My test case is just a small test case and a my code's performance will be important when running in release.

EDIT: note that I am using MonoTouch and running the code on an iPad Mini device, so perhaps it's nothing related to c# and more something related to MonoTouch's cross compiler.

like image 384
Camputer Avatar asked Apr 11 '13 13:04

Camputer


2 Answers

Frankly, there are other, better approaches here. The following outputs (ignoring the x1, which is for JIT):

x5000000
Example1        128ms
Example2        136ms
Example3        129ms
CompareExchange 53ms
ReadUnsafe      54ms
UntypedBox      23ms
TypedBox        12ms

x5000000
Example1        129ms
Example2        129ms
Example3        129ms
CompareExchange 52ms
ReadUnsafe      53ms
UntypedBox      23ms
TypedBox        12ms

x5000000
Example1        129ms
Example2        161ms
Example3        129ms
CompareExchange 52ms
ReadUnsafe      53ms
UntypedBox      23ms
TypedBox        12ms

All of these are thread safe implementations. As you can see, the fastest is a typed box, followed by an untyped (object) box. Next comes (at about the same speed) Interlocked.CompareExchange / Interlocked.Read - note that the latter only supports long, so we need to do some bit-bashing to treat that as a double.

Obviously, test on your target framework.

For fun, I also tested a Mutex; on the same scale test, that takes about 3300ms.

using System;
using System.Diagnostics;
using System.Threading;
abstract class Experiment
{
    public abstract double GetValue();
}
class Example1 : Experiment
{
    private readonly object _sync = new object();
    private double _value = 3;
    public override double GetValue()
    {
        lock (_sync)
        {
            return _value;
        }
    }
}
class Example2 : Experiment
{
    private readonly object _sync = new object();
    private double _value = 3;
    public override double GetValue()
    {
        lock (_sync)
        {
            return _value;
        }
    }
}

class Example3 : Experiment
{
    private readonly object _sync = new object();
    private double _value = 3;
    public override double GetValue()
    {
        double result = 0;
        lock (_sync)
        {
            result = _value;
        }
        return result;
    }
}

class CompareExchange : Experiment
{
    private double _value = 3;
    public override double GetValue()
    {
        return Interlocked.CompareExchange(ref _value, 0, 0);
    }
}
class ReadUnsafe : Experiment
{
    private long _value = DoubleToInt64(3);
    static unsafe long DoubleToInt64(double val)
    {   // I'm mainly including this for the field initializer
        // in real use this would be manually inlined
        return *(long*)(&val);
    }
    public override unsafe double GetValue()
    {
        long val = Interlocked.Read(ref _value);
        return *(double*)(&val);
    }
}
class UntypedBox : Experiment
{
    // references are always atomic
    private volatile object _value = 3.0;
    public override double GetValue()
    {
        return (double)_value;
    }
}
class TypedBox : Experiment
{
    private sealed class Box
    {
        public readonly double Value;
        public Box(double value) { Value = value; }

    }
    // references are always atomic
    private volatile Box _value = new Box(3);
    public override double GetValue()
    {
        return _value.Value;
    }
}
static class Program
{
    static void Main()
    {
        // once for JIT
        RunExperiments(1);
        // three times for real
        RunExperiments(5000000);
        RunExperiments(5000000);
        RunExperiments(5000000);
    }
    static void RunExperiments(int loop)
    {
        Console.WriteLine("x{0}", loop);
        RunExperiment(new Example1(), loop);
        RunExperiment(new Example2(), loop);
        RunExperiment(new Example3(), loop);
        RunExperiment(new CompareExchange(), loop);
        RunExperiment(new ReadUnsafe(), loop);
        RunExperiment(new UntypedBox(), loop);
        RunExperiment(new TypedBox(), loop);
        Console.WriteLine();
    }
    static void RunExperiment(Experiment test, int loop)
    {
        // avoid any GC interruptions
        GC.Collect(GC.MaxGeneration, GCCollectionMode.Forced);
        GC.Collect(GC.MaxGeneration, GCCollectionMode.Forced);
        GC.WaitForPendingFinalizers();

        double val = 0;
        var watch = Stopwatch.StartNew();
        for (int i = 0; i < loop; i++)
            val = test.GetValue();
        watch.Stop();
        if (val != 3.0) Console.WriteLine("FAIL!");
        Console.WriteLine("{0}\t{1}ms", test.GetType().Name,
            watch.ElapsedMilliseconds);

    }

}
like image 199
Marc Gravell Avatar answered Nov 19 '22 20:11

Marc Gravell


Measuring only reads for concurrency is misleading, your cache will give you orders of magnitude better results than real use case would. So I added SetValue to Marc's example:

using System;
using System.Diagnostics;
using System.Threading;

abstract class Experiment
{
    public abstract double GetValue();
    public abstract void SetValue(double value);
}

class Example1 : Experiment
{
    private readonly object _sync = new object();
    private double _value = 3;
    public override double GetValue()
    {
        lock (_sync)
        {
            return _value;
        }
    }

    public override void SetValue(double value)
    {
        lock (_sync)
        {
            _value = value;
        }

    }

}
class Example2 : Experiment
{
    private readonly object _sync = new object();
    private double _value = 3;
    public override double GetValue()
    {
        lock (_sync)
        {
            return _value;
        }
    }

    public override void SetValue(double value)
    {
        lock (_sync)
        {
            _value = value;
        }
    }

}



class Example3 : Experiment
{
    private readonly object _sync = new object();
    private double _value = 3;
    public override double GetValue()
    {
        double result = 0;
        lock (_sync)
        {
            result = _value;
        }
        return result;
    }

    public override void SetValue(double value)
    {
        lock (_sync)
        {
            _value = value;
        }
    }
}

class CompareExchange : Experiment
{
    private double _value = 3;
    public override double GetValue()
    {
        return Interlocked.CompareExchange(ref _value, 0, 0);
    }

    public override void SetValue(double value)
    {
        Interlocked.Exchange(ref _value, value);
    }
}
class ReadUnsafe : Experiment
{
    private long _value = DoubleToInt64(3);
    static unsafe long DoubleToInt64(double val)
    {   // I'm mainly including this for the field initializer
        // in real use this would be manually inlined
        return *(long*)(&val);
    }
    public override unsafe double GetValue()
    {
        long val = Interlocked.Read(ref _value);
        return *(double*)(&val);
    }

    public override void SetValue(double value)
    {
        long intValue = DoubleToInt64(value);
        Interlocked.Exchange(ref _value, intValue);
    }
}
class UntypedBox : Experiment
{
    // references are always atomic
    private volatile object _value = 3.0;
    public override double GetValue()
    {
        return (double)_value;
    }

    public override void SetValue(double value)
    {
        object valueObject = value;
        _value = valueObject;
    }
}
class TypedBox : Experiment
{
    private sealed class Box
    {
        public readonly double Value;
        public Box(double value) { Value = value; }

    }
    // references are always atomic
    private volatile Box _value = new Box(3);
    public override double GetValue()
    {
        Box value = _value;
        return value.Value;
    }

    public override void SetValue(double value)
    {
        Box boxValue = new Box(value);
        _value = boxValue;
    }
}
static class Program
{
    static void Main()
    {
        // once for JIT
        RunExperiments(1);
        // three times for real
        RunExperiments(5000000);
        RunExperiments(5000000);
        RunExperiments(5000000);
    }
    static void RunExperiments(int loop)
    {
        Console.WriteLine("x{0}", loop);
        RunExperiment(new Example1(), loop);
        RunExperiment(new Example2(), loop);
        RunExperiment(new Example3(), loop);
        RunExperiment(new CompareExchange(), loop);
        RunExperiment(new ReadUnsafe(), loop);
        RunExperiment(new UntypedBox(), loop);
        RunExperiment(new TypedBox(), loop);
        Console.WriteLine();
    }
    static void RunExperiment(Experiment test, int loop)
    {
        // avoid any GC interruptions
        GC.Collect(GC.MaxGeneration, GCCollectionMode.Forced);
        GC.Collect(GC.MaxGeneration, GCCollectionMode.Forced);
        GC.WaitForPendingFinalizers();

        int threads = Environment.ProcessorCount;

        ManualResetEvent done = new ManualResetEvent(false);

        // Since we use threads, divide the original workload
        //
        int workerLoop = Math.Max(1, loop / Environment.ProcessorCount);
        int writeRatio = 1000;
        int writes = Math.Max(workerLoop / writeRatio, 1);
        int reads = workerLoop / writes;

        var watch = Stopwatch.StartNew();

        for (int t = 0; t < Environment.ProcessorCount; ++t)
        {
            ThreadPool.QueueUserWorkItem((state) =>
                {
                    try
                    {
                        double val = 0;

                        // Two loops to avoid comparison for % in the inner loop
                        //
                        for (int j = 0; j < writes; ++j)
                        {
                            test.SetValue(j);
                            for (int i = 0; i < reads; i++)
                            {
                                val = test.GetValue();
                            }
                        }
                    }
                    finally
                    {
                        if (0 == Interlocked.Decrement(ref threads))
                        {
                            done.Set();
                        }
                    }
                });
        }
        done.WaitOne();
        watch.Stop();
        Console.WriteLine("{0}\t{1}ms", test.GetType().Name,
            watch.ElapsedMilliseconds);

    }
}

Results are, at 1000:1 read:write ratio:

x5000000
Example1        353ms
Example2        395ms
Example3        369ms
CompareExchange 150ms
ReadUnsafe      161ms
UntypedBox      11ms
TypedBox        9ms

100:1 (read:write)

x5000000
Example1        356ms
Example2        360ms
Example3        356ms
CompareExchange 161ms
ReadUnsafe      172ms
UntypedBox      14ms
TypedBox        13ms

10:1 (read:write)

x5000000
Example1        383ms
Example2        394ms
Example3        414ms
CompareExchange 169ms
ReadUnsafe      176ms
UntypedBox      41ms
TypedBox        43ms

2:1 (read:write)

x5000000
Example1        550ms
Example2        581ms
Example3        560ms
CompareExchange 257ms
ReadUnsafe      292ms
UntypedBox      101ms
TypedBox        122ms

1:1 (read:write)

x5000000
Example1        718ms
Example2        745ms
Example3        730ms
CompareExchange 381ms
ReadUnsafe      376ms
UntypedBox      161ms
TypedBox        200ms

*Updated the code to remove the unnecessary ICX operations on write, since the value is overwritten always. Also fixed the formula to compute the number of reads to divide by threads (same work).

like image 6
Remus Rusanu Avatar answered Nov 19 '22 20:11

Remus Rusanu