Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Application hangs in SysUtils -> DoneMonitorSupport on exit

I am writing a very thread intensive application that hangs when it exits.

I've traced into the system units and found the place where the program enters an infinite loop. It's in SysUtils line 19868 -> DoneMonitorSupport -> CleanEventList:

repeat until InterlockedCompareExchange(EventCache[I].Lock, 1, 0) = 0;

I've searched for a solution online and found a couple of QC reports:

  • http://qc.embarcadero.com/wc/qcmain.aspx?d=95194
  • http://qc.embarcadero.com/wc/qcmain.aspx?d=90487

Unfortunately, these don't seem to relate to my situation as I don't use either TThreadList or TMonitor.

I'm pretty sure that all my threads have finished and have been destroyed as that all inherit from a base thread that keeps a create/destroy count.

Has anybody come across similar behaviour before? Do you know of any strategies for discovering where the root cause may lie?

like image 447
norgepaul Avatar asked Jan 08 '13 14:01

norgepaul


Video Answer


1 Answers

I've been looking at how the TMonitor locks are implemented, and I finally made an interesting discovery. For a bit of drama, I'll first tell you how the locks work.

When you call any TMonitor function on an TObject, a new instance of the TMonitor record is created and that instance is assigned to a MonitorFld inside the object itself. This assignment is made in a thread-safe way, using InterlockedCompareExchangePointer. Because of this trick the TObject only contains one pointer-size amount of data for the support of TMonitor, it doesn't contain the full TMonitor structure. And that's a good thing.

This TMonitor structure contains a number of records. We'll start with the FLockCount: Integer field. When the first thread uses TMonitor.Enter() on any object, this combined lock-counter field will have the value ZERO. Again using a InterlockedCompareExchange method the lock is acquired and the counter is initiated. There will be no locking for the calling thread, no context-switch since this is all done in-process.

When the second thread tries to TMonitor.Enter() the same object, it's first attempt to lock will fail. When that happens Delphi follows two strategies:

  • If the developer used TMonitor.SetSpinCount() to set a number of "spins", then Delphi will do a busy-wait loop, spinning the given number of times. That's very nice for tiny locks because it allows acquiring the lock without doing a context-switch.
  • If the spin-count expires (or there's no spin-count, and by default the spin count zero), TMonitor.Enter() will initiate a Wait on the event returned by TMonitor.GetEvent(). In other words it will not busy-wait wasting CPU cycles. Remember the TMonitor.GetEvent() because that's very important.

Let's say we've got a thread that acquired the lock and a thread that tried to acquire the lock but is now waiting on the event returned by TMonitor.GetEvent. When the first thread calls TMonitor.Exit() it will notice (via the FLockCount field) that there is at least one other thread blocking. So it immediately pulses what should normally be the previously allocated event (calls TMonitor.GetEvent()). But since the two threads, the one that calls TMonitor.Exit() and the one that called TMonitor.Enter() might actually call TMonitor.GetEvent() at the same time, tehre are a couple more tricks inside TMonitor.GetEvent() to make sure that only one event is allocated, irrelevant of the order of operations.

For a few more fun moments we'll now delve into the way the TMonitor.GetEvent() works. This thing lives inside the System unit (you know, the one we can't recompile to play with), but it turns out it delegates the duty of actually allocated the Event to an other unit, through the System.MonitorSupport pointer. That points to a record of type TMonitorSupport that declares 5 function pointers:

  • NewSyncObject - allocates a new Event for Synchronization purposes
  • FreeSyncObject - deallocates the Event allocated for Synchronization purposes
  • NewWaitObject - allocates a new Event for Wait operations
  • FreeWaitObject - deallocates that Wait event
  • WaitAndOrSignalObject - well.. waits or signals.

It also turns out that the objects returned by the NewXYZ functions could be anything, because they're only used for the call to WaitXYZ and for the corresponding call to FreeXyzObject. The way those functions are implemented in SysUtils is designed to provide those locks with a minimum amount of locking and context-switching; Because of that the objects themselves (returned by NewSyncObject and NewWaitObject) are not directly the Events returned by CreateEvent(), but pointers to records in the SyncEventCacheArray. It goes even further, actual Windows Events are not created until required. Because of that the records in the SyncEventCacheArray contains a couple of records:

  • TSyncEventItem.Lock - this tells Delphi rather the Lock is being used for anything right now or not and
  • TSyncEventItem.Event - this holds the actual Event that'll be used for synchronization, if waiting is required.

When the application terminates, the SysUtils.DoneMonitorSupport goes over all the records in the SyncEventCacheArray and waits for the Lock to become ZERO, ie, waits for the lock to stop being used by anything. Theoretically, as long as that lock is NOT Zero, at least one thread out there might be using the lock - so the sane thing to do would be to wait, in order to NOT cause AccessViolations errors. And we finally got to our current question: HANGING in SysUtils.DoneMonitorSupport

Why an application might Hang in SysUtils.DoneMonitorSupport even if all it's threads terminated properly?

Because at least one Event allocated using any one of NewSyncObject or NewWaitObject was not freed using it's corresponding FreeSyncObject or FreeWaitObject. And we go back to the TMonitor.GetEvent() routine. The Event it allocates is saved in the TMonitor record that corresponds to the object that was used for TMonitor.Enter(). The pointer to that record is only kept in that object's instance data, and is kept there for the life of the application. Searching for the name of the field, FLockEvent, we find this in the System.pas file:

procedure TMonitor.Destroy;
begin
  if (MonitorSupport <> nil) and (FLockEvent <> nil) then
    MonitorSupport.FreeSyncObject(FLockEvent);
  Dispose(@Self);
end;

and a call to that record-destructor in here: procedure TObject.CleanupInstance.

In other words, the final sync-event is only released when the object that was used for synchronization is freed!

Answer to OP's question:

The application hangs because at least one OBJECT that was used for TMonitor.Enter() was not freed.

Possible solutions:

Unfortunately I don't like this. It's not right, I mean the penalty for not freeing a small object should be a small memory leak, not a hanging application! This is especially bad for Service applications where a service might simply hang for ever, not fully shut down but unable to respond to any request.

The solutions for the Delphi team? They should NOT hang in the finalization code of the SysUtils unit, no-matter-what. They should probably ignore the Lock and move to closing the Event handle. At that stage (finalization of the SysUtils unit), if there's still code running in some thread, it's in a real bad shape as most of the units got finalized, it's not running in the environment it was designed to run in.

For the delphi users? We can replace the MonitorSupport with our own version, one that doesn't do those extensive tests at finalization time.

like image 144
Cosmin Prund Avatar answered Sep 17 '22 14:09

Cosmin Prund