I was debugging a multi-threaded application and found the internal structure of CRITICAL_SECTION
. I found data member LockSemaphore
of CRITICAL_SECTION an interesting one.
It looks like LockSemaphore
is an auto-reset event (not a semaphore as the name suggests) and operating system creates this event silently when first time a thread waits on Critcal Section
which is locked by some other thread.
Now, I am wondering is Critical Section always faster? Event is a kernel object and each Critical section object is associated with event object then how Critical Section
can be faster compared to other kernel objects like Mutex? Also, how does internal event object actually affects the performance of Critical section ?
Here is the structure of the CRITICAL_SECTION
:
struct RTL_CRITICAL_SECTION { PRTL_CRITICAL_SECTION_DEBUG DebugInfo; LONG LockCount; LONG RecursionCount; HANDLE OwningThread; HANDLE LockSemaphore; ULONG_PTR SpinCount; };
Windows is designed so that threads, even when running in kernel mode, are always preemptible and always interruptible. So critical sections are certainly not implemented by disabling interrupts and do not prevent preemption.
Only one thread can execute a particular critical section at any given time.
When they say that a critical section is "fast", they mean "it's cheap to acquire one when it isn't already locked by another thread".
[Note that if it is already locked by another thread, then it doesn't matter nearly so much how fast it is.]
The reason why it's fast is because, before going into the kernel, it uses the equivalent of InterlockedIncrement
on one of those LONG
field (perhaps on the the LockCount
field) and if it succeeds then it considers the lock aquired without having gone into the kernel.
The InterlockedIncrement
API is I think implemented in user mode as a "LOCK INC" opcode ... in other words you can acquire an uncontested critical section without doing any ring transition into the kernel at all.
In performance work, few things fall into the "always" category :) If you implement something yourself that is similar to an OS critical section using other primitives then odds are that will be slower in most cases.
The best way to answer your question is with performance measurements. How OS objects perform is very dependent on the scenario. For example, critical sections are general considered 'fast' if contention is low. They are also considered fast if the lock time is less than the spin count time.
The most important thing to determine is if contention on a critical section is the first order limiting factor in your application. If not, then simply use a critical section normaly and work on your applications primary bottleneck (or necks).
If critical section performance is critical, then you can consider the following.
In summary - tuning scenarios that have lock contention can be challenging (but interesting!) work. Focus on measuring your applications performance and understanding where your hot paths are. The xperf tools in the Windows Performance Tool kit is your friend here :) We just released version 4.5 in the Microsoft Windows SDK for Windows 7 and .NET Framework 3.5 SP1 (ISO is here, web installer here). You can find the forum for the xperf tools here. V4.5 fully supports Win7, Vista, Windows Server 2008 - all versions.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With