I am writing a very thread intensive application that hangs when it exits.
I've traced into the system units and found the place where the program enters an infinite loop. It's in SysUtils line 19868 -> DoneMonitorSupport -> CleanEventList:
repeat until InterlockedCompareExchange(EventCache[I].Lock, 1, 0) = 0;
I've searched for a solution online and found a couple of QC reports:
Unfortunately, these don't seem to relate to my situation as I don't use either TThreadList or TMonitor.
I'm pretty sure that all my threads have finished and have been destroyed as that all inherit from a base thread that keeps a create/destroy count.
Has anybody come across similar behaviour before? Do you know of any strategies for discovering where the root cause may lie?
I've been looking at how the TMonitor
locks are implemented, and I finally made an interesting discovery. For a bit of drama, I'll first tell you how the locks work.
When you call any TMonitor
function on an TObject
, a new instance of the TMonitor
record is created and that instance is assigned to a MonitorFld
inside the object itself. This assignment is made in a thread-safe way, using InterlockedCompareExchangePointer
. Because of this trick the TObject
only contains one pointer-size amount of data for the support of TMonitor
, it doesn't contain the full TMonitor structure. And that's a good thing.
This TMonitor
structure contains a number of records. We'll start with the FLockCount: Integer
field. When the first thread uses TMonitor.Enter()
on any object, this combined lock-counter field will have the value ZERO. Again using a InterlockedCompareExchange
method the lock is acquired and the counter is initiated. There will be no locking for the calling thread, no context-switch since this is all done in-process.
When the second thread tries to TMonitor.Enter()
the same object, it's first attempt to lock will fail. When that happens Delphi follows two strategies:
TMonitor.SetSpinCount()
to set a number of "spins", then Delphi will do a busy-wait loop, spinning the given number of times. That's very nice for tiny locks because it allows acquiring the lock without doing a context-switch.TMonitor.Enter()
will initiate a Wait on the event returned by TMonitor.GetEvent()
. In other words it will not busy-wait wasting CPU cycles. Remember the TMonitor.GetEvent()
because that's very important.Let's say we've got a thread that acquired the lock and a thread that tried to acquire the lock but is now waiting on the event returned by TMonitor.GetEvent
. When the first thread calls TMonitor.Exit()
it will notice (via the FLockCount
field) that there is at least one other thread blocking. So it immediately pulses what should normally be the previously allocated event (calls TMonitor.GetEvent()
). But since the two threads, the one that calls TMonitor.Exit()
and the one that called TMonitor.Enter()
might actually call TMonitor.GetEvent()
at the same time, tehre are a couple more tricks inside TMonitor.GetEvent()
to make sure that only one event is allocated, irrelevant of the order of operations.
For a few more fun moments we'll now delve into the way the TMonitor.GetEvent()
works. This thing lives inside the System
unit (you know, the one we can't recompile to play with), but it turns out it delegates the duty of actually allocated the Event to an other unit, through the System.MonitorSupport
pointer. That points to a record of type TMonitorSupport
that declares 5 function pointers:
NewSyncObject
- allocates a new Event for Synchronization purposesFreeSyncObject
- deallocates the Event allocated for Synchronization purposesNewWaitObject
- allocates a new Event for Wait operationsFreeWaitObject
- deallocates that Wait eventWaitAndOrSignalObject
- well.. waits or signals.It also turns out that the objects returned by the NewXYZ
functions could be anything, because they're only used for the call to WaitXYZ
and for the corresponding call to FreeXyzObject
. The way those functions are implemented in SysUtils
is designed to provide those locks with a minimum amount of locking and context-switching; Because of that the objects themselves (returned by NewSyncObject
and NewWaitObject
) are not directly the Events returned by CreateEvent()
, but pointers to records in the SyncEventCacheArray
. It goes even further, actual Windows Events are not created until required. Because of that the records in the SyncEventCacheArray
contains a couple of records:
TSyncEventItem.Lock
- this tells Delphi rather the Lock is being used for anything right now or not andTSyncEventItem.Event
- this holds the actual Event that'll be used for synchronization, if waiting is required.When the application terminates, the SysUtils.DoneMonitorSupport
goes over all the records in the SyncEventCacheArray
and waits for the Lock to become ZERO, ie, waits for the lock to stop being used by anything. Theoretically, as long as that lock is NOT Zero, at least one thread out there might be using the lock - so the sane thing to do would be to wait, in order to NOT cause AccessViolations errors. And we finally got to our current question: HANGING in SysUtils.DoneMonitorSupport
Because at least one Event allocated using any one of NewSyncObject
or NewWaitObject
was not freed using it's corresponding FreeSyncObject
or FreeWaitObject
. And we go back to the TMonitor.GetEvent()
routine. The Event it allocates is saved in the TMonitor
record that corresponds to the object that was used for TMonitor.Enter()
. The pointer to that record is only kept in that object's instance data, and is kept there for the life of the application. Searching for the name of the field, FLockEvent
, we find this in the System.pas
file:
procedure TMonitor.Destroy;
begin
if (MonitorSupport <> nil) and (FLockEvent <> nil) then
MonitorSupport.FreeSyncObject(FLockEvent);
Dispose(@Self);
end;
and a call to that record-destructor in here: procedure TObject.CleanupInstance
.
In other words, the final sync-event is only released when the object that was used for synchronization is freed!
The application hangs because at least one OBJECT that was used for TMonitor.Enter()
was not freed.
Unfortunately I don't like this. It's not right, I mean the penalty for not freeing a small object should be a small memory leak, not a hanging application! This is especially bad for Service applications where a service might simply hang for ever, not fully shut down but unable to respond to any request.
The solutions for the Delphi team? They should NOT hang in the finalization code of the SysUtils
unit, no-matter-what. They should probably ignore the Lock
and move to closing the Event handle. At that stage (finalization of the SysUtils unit), if there's still code running in some thread, it's in a real bad shape as most of the units got finalized, it's not running in the environment it was designed to run in.
For the delphi users? We can replace the MonitorSupport
with our own version, one that doesn't do those extensive tests at finalization time.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With