Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

c# lock statement performance

Update - I have found the cause of lock() eating CPU cycles like crazy. I added this information after my original question. This all turned out to be a wall of text so:

TL;DR The c# built-in lock() mechanism will, under some circumstances, use an unusual amount of CPU time if your system is running with a high resolution system timer.

Original question:

I have an application that accesses a resource from multiple threads. The resource is a device attached to USB. Its a simple command/response interface and I use a small lock() block to ensure that the thread that sends a command also gets the response. My implementation uses the lock(obj) keyword:

lock (threadLock)
{
    WriteLine(commandString);
    rawResponse = ReadLine();
}

When I access this from 3 threads as fast as possible (in a tight loop) the CPU usage is about 24% on a high-end computer. Due to the nature of the USB port only about 1000 command/response operations are performed per second. Then I implemented the lock mechanism described here SimpleExclusiveLock and the code now looks similar to this (some try/catch stuff to release the lock in case of an I/O exception is removed):

Lock.Enter();
WriteLine(commandString);
rawResponse = ReadLine();
Lock.Exit();

Using this implementation the CPU usage drops to <1% with the same 3 thread test program while still getting the 1000 command/response operations per second.

The question is: What, in this case, is the problem using the built-in lock() keyword?

Have I accidentally stumbled upon a case where the lock() mechanism has exceptionally high overhead? The thread that enters the critical section will hold the lock for only about 1 ms.

Update: The cause of lock() eating CPU like crazy is that some application has increased the timer resolution for the whole system using timeBeginPeriod() in winmm.dll. The culprits in my case are Google Chrome and SQL Server - they requested a 1 ms system timer resolution using:

[DllImport("winmm.dll", EntryPoint = "timeBeginPeriod", SetLastError = true)]
private static extern uint TimeBeginPeriod(uint uMilliseconds);

I found this out by using the powercfg tool:

powercfg -energy duration 5 

Due to some sort of design flaw in the built-in lock() statement this increased timer resolution eats CPU like crazy (at least in my case). So, I killed the programs that request high resolution system timer. My application now runs a bit slower. Each request will now lock for 16.5 ms instead of 1 ms. The reason behind that I guess is that the threads are scheduled less frequently. The CPU usage (as shown in Task Manager) also dropped to zero. I have no doubt that lock() still uses quite a few cycles but that is now hidden.

In my project low CPU use is an important design factor. The low 1 ms latency of USB requests are also positive for the overall design. So (in my case) the solution is to discard the built-in lock() and replace it with a properly implemented lock mechanism. I already threw out the flawed System.IO.Ports.SerialPort in favor of WinUSB so I have no fears :)

I made a small console-application to demonstrate all of this, pm me if you are interested in a copy (~100 lines of code).

I guess I answered my own question so I´ll just leave this here in case someone is interested...

like image 953
Mikael Avatar asked Apr 02 '14 08:04

Mikael


1 Answers

No, sorry, this is not possible. There's no scenario where you have 3 threads with 2 of them blocking on the lock and 1 blocking on an I/O operation that takes a millisecond can get you 24% cpu utilization. The linked article is perhaps interesting, but the .NET Monitor class does the exact same thing. Including the CompareExchange() optimization and the wait queue.

The only way you can get to 24% is through other code that runs in your program. With the common cycle stealer being the UI thread that you pummel a thousand times per second. Very easy to burn core that way. A classic mistake, human eyes can't read that fast. With the further extrapolation that you then wrote a test program that doesn't update UI. And thus doesn't burn core.

A profiler will of course tell you exactly where those cycles go. It should be your next step.

like image 72
Hans Passant Avatar answered Sep 29 '22 10:09

Hans Passant