In our Delphi XE4 application we are using an OmniThreadPool with MaxExecuting=4 to improve the efficiency of a certain calculation. Unfortunately we are having trouble with intermittent access violations (see for example the following MadExcept bug report http://ec2-72-44-42-247.compute-1.amazonaws.com/BugReport.txt). I was able to construct the following example which demonstrates the problem. After running the following console application, an access violation in System.SyncObjs.TCriticalSection.Acquire usually occurs within a minute or so. Can anybody tell me what I am doing wrong in the following code, or show me another way of achieving the desired result?
program OmniPoolCrashTest;
{$APPTYPE CONSOLE}
uses
Winapi.Windows, System.SysUtils,
DSiWin32, GpLists,
OtlSync, OtlThreadPool, OtlTaskControl, OtlComm, OtlTask;
const
cTimeToWaitForException = 10 * 60 * 1000; // program exits if no exception after 10 minutes
MSG_CALLEE_FINISHED = 113; // our custom Omni message ID
cMaxAllowedParallelCallees = 4; // enforced via thread pool
cCalleeDuration = 10; // 10 miliseconds
cCallerRepetitionInterval = 200; // 200 milliseconds
cDefaultNumberOfCallers = 10; // 10 callers each issuing 1 call every 200 milliseconds
var
gv_OmniThreadPool : IOmniThreadPool;
procedure OmniTaskProcedure_Callee(const task: IOmniTask);
begin
Sleep(cCalleeDuration);
task.Comm.Send(MSG_CALLEE_FINISHED);
end;
procedure PerformThreadPoolTest();
var
OmniTaskControl : IOmniTaskControl;
begin
OmniTaskControl := CreateTask(OmniTaskProcedure_Callee).Schedule(gv_OmniThreadPool);
WaitForSingleObject(OmniTaskControl.Comm.NewMessageEvent, INFINITE);
end;
procedure OmniTaskProcedure_Caller(const task: IOmniTask);
begin
while not task.Terminated do begin
PerformThreadPoolTest();
Sleep(cCallerRepetitionInterval);
end;
end;
var
CallerTasks : TGpInterfaceList<IOmniTaskControl>;
i : integer;
begin
gv_OmniThreadPool := CreateThreadPool('CalleeThreadPool');
gv_OmniThreadPool.MaxExecuting := cMaxAllowedParallelCallees;
CallerTasks := TGpInterfaceList<IOmniTaskControl>.Create();
for i := 1 to StrToIntDef(ParamStr(1), cDefaultNumberOfCallers) do begin
CallerTasks.Add( CreateTask(OmniTaskProcedure_Caller).Run() );
end;
Sleep(cTimeToWaitForException);
for i := 0 to CallerTasks.Count-1 do begin
CallerTasks[i].Terminate();
end;
CallerTasks.Free();
end.
It looks like your termination message is causing the problem. Removing the message and the WaitForSingleObject stopped the AV. In my tests just adding a .OnTerminated(procedure begin end) before the .Schedule also did enough to change the flow and to stop the error. So the code in that case would look like this:
procedure PerformThreadPoolTest();
var
OmniTaskControl : IOmniTaskControl;
begin
OmniTaskControl := CreateTask(OmniTaskProcedure_Callee).OnTerminated(procedure begin end).Schedule(gv_OmniThreadPool);
WaitForSingleObject(OmniTaskControl.Comm.NewMessageEvent, INFINITE);
end;
It looks to me like this might be the problem. otSharedInfo_ref has a property called MonitorLock. This is used to block changes to otSharedInfo_ref. If for some reason otSharedInfo_ref is freed while the acquire is waiting then you are likely to get some very weird behavior
The code as it stands looks like this:
procedure TOmniTask.InternalExecute(calledFromTerminate: boolean);
begin
...
// with internal monitoring this will not be processed if the task controller owner is also shutting down
sync := nil; // to remove the warning in the 'finally' clause below
otSharedInfo_ref.MonitorLock.Acquire;
try
sync := otSharedInfo_ref.MonitorLock.SyncObj;
if assigned(otSharedInfo_ref.Monitor) then
otSharedInfo_ref.Monitor.Send(COmniTaskMsg_Terminated,
integer(Int64Rec(UniqueID).Lo), integer(Int64Rec(UniqueID).Hi));
otSharedInfo_ref := nil;
finally sync.Release; end;
...
end; { TOmniTask.InternalExecute }
If otSharedInfo_ref.MonitorLock.Acquire is busy waiting and the object behind otSharedInfo_ref is freed then we end up in a very nasty place. Changing the code to this stopped the AV that was happening in InternalExecute:
procedure TOmniTask.InternalExecute(calledFromTerminate: boolean);
var
...
monitorLock: TOmniCS;
...
begin
...
// with internal monitoring this will not be processed if the task controller owner is also shutting down
sync := nil; // to remove the warning in the 'finally' clause below
monitorLock := otSharedInfo_ref.MonitorLock;
monitorLock.Acquire;
try
sync := monitorLock.SyncObj;
if assigned(otSharedInfo_ref) and assigned(otSharedInfo_ref.Monitor) then
otSharedInfo_ref.Monitor.Send(COmniTaskMsg_Terminated,
integer(Int64Rec(UniqueID).Lo), integer(Int64Rec(UniqueID).Hi));
otSharedInfo_ref := nil;
finally sync.Release; end;
...
end; { TOmniTask.InternalExecute }
I did start getting AV's in the OmniTaskProcedure_Callee method then on the "task.Comm.Send(MSG_CALLEE_FINISHED)" line so it's still not fixed but this should help others/Primoz to further identify what is going on. In the new error, task.Comm is often unassigned.
You have here an example of hard-to-find Task controller needs an owner problem. What happens is that the task controller sometimes gets destroyed before the task itself and that causes the task to access memory containing random data.
Problematic scenario goes like this ([T] marks task, [C] marks task controller):
In the Graymatter's workaround, OnTerminated
creates an implicit owner for the task inside the OmniThreadLibrary which "solves" the problem.
The correct way to wait on the task to complete is to call taskControler.WaitFor.
procedure OmniTaskProcedure_Callee(const task: IOmniTask);
begin
Sleep(cCalleeDuration);
end;
procedure PerformThreadPoolTest();
var
OmniTaskControl : IOmniTaskControl;
begin
OmniTaskControl := CreateTask(OmniTaskProcedure_Callee).Schedule(gv_OmniThreadPool);
OmniTaskControl.WaitFor(INFINITE);
end;
I will look into replacing shared memory record with reference-counted solution which would prevent such problems (or at least make them easier to find).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With