I have following application:
application deployment diagram
A - Application A is a .net wcf service hosted in IIS 7.5 using c# mongodriver 2.2.4 compiled in .net 4.5
B - Application B is a windows service application using mongodriver 1.11 compiled in .net 3.5
Both services are similar, service B is maintained for legacy systems and service A is under evolution.
Both applications are hosted in same servers. (Windows Standard 2008 R2) This applications had been running for more than 1 year perfectly but since 24/June/2016 application A (WCF) began a strange behavior when opening a new connection to Mongo Server:
> System.TimeoutException: A timeout occured after 30000ms selecting a > server using CompositeServerSelector{ Selectors = > ReadPreferenceServerSelector{ ReadPreference = { Mode = Primary, > TagSets = [] } }, LatencyLimitingServerSelector{ AllowedLatencyRange = > 00:00:00.0150000 } }. Client view of cluster state is { ClusterId : > "1", ConnectionMode : "ReplicaSet", Type : "ReplicaSet", State : > "Disconnected", Servers : [{ ServerId: "{ ClusterId : 1, EndPoint : > "Unspecified/mongodb-log-act01:27017" }", EndPoint: > "Unspecified/mongodb-log-act01:27017", State: "Disconnected", Type: > "Unknown" }] }. at > MongoDB.Driver.Core.Clusters.Cluster.WaitForDescriptionChangedHelper.HandleCompletedTask(Task > completedTask) at > MongoDB.Driver.Core.Clusters.Cluster.<WaitForDescriptionChangedAsync>d__44.MoveNext() > --- End of stack trace from previous location where exception was thrown --- at > System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at > System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task > task) at > MongoDB.Driver.Core.Clusters.Cluster.<SelectServerAsync>d__37.MoveNext() > --- End of stack trace from previous location where exception was thrown --- at > System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at > System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task > task) at > System.Runtime.CompilerServices.ConfiguredTaskAwaitable`1.ConfiguredTaskAwaiter.GetResult() > at > MongoDB.Driver.Core.Bindings.ReadPreferenceBinding.<GetReadChannelSourceAsync>d__8.MoveNext() > --- End of stack trace from previous location where exception was thrown --- at > System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at > System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task > task) at > System.Runtime.CompilerServices.ConfiguredTaskAwaitable`1.ConfiguredTaskAwaiter.GetResult() > at > MongoDB.Driver.Core.Operations.FindOperation`1.<ExecuteAsync>d__107.MoveNext() > --- End of stack trace from previous location where exception was thrown --- at > System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at > System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task > task) at > MongoDB.Driver.OperationExecutor.<ExecuteReadOperationAsync>d__1`1.MoveNext() > --- End of stack trace from previous location where exception was thrown --- at > System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at > System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task > task) at > MongoDB.Driver.MongoCollectionImpl`1.<ExecuteReadOperationAsync>d__59`1.MoveNext() > --- End of stack trace from previous location where exception was thrown --- at > System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at > System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task > task) at > System.Runtime.CompilerServices.ConfiguredTaskAwaitable`1.ConfiguredTaskAwaiter.GetResult() > at > MongoDB.Driver.IAsyncCursorSourceExtensions.<ToListAsync>d__16`1.MoveNext() > --- End of stack trace from previous location where exception was thrown --- at > System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at > System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task > task) at > Liberty.LogService.Implementation.LogManagerService.<Inicializar>d__0.MoveNext()
This error is not constant, happen sometimes with no reason. But service B keeps working and if a try to connect to mongo from my desktop I can do it, so how can mongo driver raise an exception related to connectivity problems if the server is totally accessible?
The last attempt was migrated to last driver version. When this problems starts I was using driver 2.0.1
I appreciate any help
This is a very tricky problem related with the Task Library. In short, there are too many tasks created and scheduled so that one of the task MongoDB's driver are waiting for is not going to be able to be finished. I took a very long time to realise it is not a deadlock although it looks like it is.
Here is the step to reproduce:
This will fail at the same place you mentioned: MongoDB.Driver.Core.Clusters.Cluster.WaitForDescriptionChangedHelper.HandleCompletedTask(Task completedTask)
If you put some break points, you will know that the WaitForDescriptionChangedHelper created a timeout task. It then waits for any one of the DescriptionUpdate task or the timeout task to complete. However, the DescriptionUpdate never happens, but why?
Now, back to my example, there is one interesting part: I started a timer. If you call the TestTask directly, it will run without any problem. By comparing them with the Visual Studio's Tasks window, you will noticed that the timer version will create a lot more tasks than the non-timer version. Let me explain this part a bit later. There is another important difference. You need to add debug lines in the Cluster.cs
:
protected void UpdateClusterDescription(ClusterDescription newClusterDescription)
{
ClusterDescription oldClusterDescription = null;
TaskCompletionSource<bool> oldDescriptionChangedTaskCompletionSource = null;
Console.WriteLine($"Before UpdateClusterDescription {_descriptionChangedTaskCompletionSource?.Task.Id}, {_descriptionChangedTaskCompletionSource?.Task?.GetHashCode().ToString("F8")}");
lock (_descriptionLock)
{
oldClusterDescription = _description;
_description = newClusterDescription;
oldDescriptionChangedTaskCompletionSource = _descriptionChangedTaskCompletionSource;
_descriptionChangedTaskCompletionSource = new TaskCompletionSource<bool>();
}
OnDescriptionChanged(oldClusterDescription, newClusterDescription);
Console.WriteLine($"Setting UpdateClusterDescription {oldDescriptionChangedTaskCompletionSource?.Task.Id}, {oldDescriptionChangedTaskCompletionSource?.Task?.GetHashCode().ToString("F8")}");
oldDescriptionChangedTaskCompletionSource.TrySetResult(true);
Console.WriteLine($"Set UpdateClusterDescription {oldDescriptionChangedTaskCompletionSource?.Task.Id}, {oldDescriptionChangedTaskCompletionSource?.Task?.GetHashCode().ToString("F8")}");
}
private void WaitForDescriptionChanged(IServerSelector selector, ClusterDescription description, Task descriptionChangedTask, TimeSpan timeout, CancellationToken cancellationToken)
{
using (var helper = new WaitForDescriptionChangedHelper(this, selector, description, descriptionChangedTask, timeout, cancellationToken))
{
Console.WriteLine($"Waiting {descriptionChangedTask?.Id}, {descriptionChangedTask?.GetHashCode().ToString("F8")}");
var index = Task.WaitAny(helper.Tasks);
helper.HandleCompletedTask(helper.Tasks[index]);
}
}
By adding these lines, you will also find out that the non-timer version will update twice but the timer version will only update once. And the second one comes from the "MonitorServerAsync" in ServerMonitor.cs. It turned out that, in the timer version, the MontiorServerAsync was executed but after it comes all the way through ServerMonitor.HeartbeatAsync, BinaryConnection.OpenAsync, BinaryConnection.OpenHelperAsync and TcpStreamFactory.CreateStreamAsync, it finally reached TcpStreamFactory.ResolveEndPointsAsync. The bad thing happens here: Dns.GetHostAddressesAsync
. This one never get executed. If you slightly modify the code and turn that into:
var task = Dns.GetHostAddressesAsync(dnsInitial.Host).ConfigureAwait(false);
return (await task)
.Select(x => new IPEndPoint(x, dnsInitial.Port))
.OrderBy(x => x, new PreferredAddressFamilyComparer(preferred))
.ToArray();
You will be able to find the task id. By looking into the Visual Studio's Tasks window, it is quite obvious that there are around 300 tasks in front of it. Only several of them are executing but blocked. If you add a Console.Writeline in the DoOneThing function, you will see that the task scheduler start several of them almost at the same time but then, it slows down to around one per second. So, this means, you need to wait for around 300 seconds before the task of resolving the dns starts to run. That why it exceeds the 30 seconds timeout.
Now, here comes a quick solution if you are not doing crazy things:
Task.Factory.StartNew(DoOneThing, TaskCreationOptions.LongRunning);
This will force the ThreadPoolScheduler to start a thread immediately instead of waiting one second before creating a new one.
However, this will not work if you are doing really crazy thing like me. Let's change the for loop from 300 to 30000, even this solution might also fail. The reason is that it creates too many threads. This is resource and time consuming. And it might start kicking off the GC process. All together, it might not be able to finish creating all those threads before the time is running out.
The perfect way is to stop creating lots of tasks and use the default scheduler to schedule them. You can try creating work item and put it into a ConcurrentQueue and then create several threads as the workers to consume the items.
However, if you don't want to change the original structure too much, you can try the following way:
Create a ThrottledTaskScheduler derived from TaskScheduler.
·
var taskScheduler = new ThrottledTaskScheduler(
TaskScheduler.Default,
128,
TaskCreationOptions.LongRunning | TaskCreationOptions.HideScheduler,
logger
);
var taskFactory = new TaskFactory(taskScheduler);
for (var i = 0; i < 30000; i++)
{
tasks.Add(taskFactory.StartNew(DoOneThing))
}
Task.WaitAll(tasks.ToArray());
You can take System.Threading.Tasks.ConcurrentExclusiveSchedulerPair.ConcurrentExclusiveTaskScheduler as a reference. It's a bit more complicated than the one we need. It's for some other purpose. So, don't worry about those parts which goes back and forth with the function inside the ConcurrentExclusiveSchedulerPair class. However, you cannot use it directly as it doesn't pass the TaskCreationOptions.LongRunning when it's creating the wrapping task.
It works for me. Good luck!
P.S.: The reason for having lots tasks in the timer version is probably laying inside the TaskScheduler.TryExecuteTaskInline. If it is in the main thread where the ThreadPool is created, it will be able to execute some of the tasks without putting them into the queue.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With