Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

System.TimeoutException: A timeout occured after 30000ms selecting a server using CompositeServerSelector

I have following application:

application deployment diagram

A - Application A is a .net wcf service hosted in IIS 7.5 using c# mongodriver 2.2.4 compiled in .net 4.5

B - Application B is a windows service application using mongodriver 1.11 compiled in .net 3.5

Both services are similar, service B is maintained for legacy systems and service A is under evolution.

Both applications are hosted in same servers. (Windows Standard 2008 R2) This applications had been running for more than 1 year perfectly but since 24/June/2016 application A (WCF) began a strange behavior when opening a new connection to Mongo Server:

> System.TimeoutException: A timeout occured after 30000ms selecting a
> server using CompositeServerSelector{ Selectors =
> ReadPreferenceServerSelector{ ReadPreference = { Mode = Primary,
> TagSets = [] } }, LatencyLimitingServerSelector{ AllowedLatencyRange =
> 00:00:00.0150000 } }. Client view of cluster state is { ClusterId :
> "1", ConnectionMode : "ReplicaSet", Type : "ReplicaSet", State :
> "Disconnected", Servers : [{ ServerId: "{ ClusterId : 1, EndPoint :
> "Unspecified/mongodb-log-act01:27017" }", EndPoint:
> "Unspecified/mongodb-log-act01:27017", State: "Disconnected", Type:
> "Unknown" }] }.    at
> MongoDB.Driver.Core.Clusters.Cluster.WaitForDescriptionChangedHelper.HandleCompletedTask(Task
> completedTask)    at
> MongoDB.Driver.Core.Clusters.Cluster.<WaitForDescriptionChangedAsync>d__44.MoveNext()
> --- End of stack trace from previous location where exception was thrown ---    at
> System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()    at
> System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task
> task)    at
> MongoDB.Driver.Core.Clusters.Cluster.<SelectServerAsync>d__37.MoveNext()
> --- End of stack trace from previous location where exception was thrown ---    at
> System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()    at
> System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task
> task)    at
> System.Runtime.CompilerServices.ConfiguredTaskAwaitable`1.ConfiguredTaskAwaiter.GetResult()
> at
> MongoDB.Driver.Core.Bindings.ReadPreferenceBinding.<GetReadChannelSourceAsync>d__8.MoveNext()
> --- End of stack trace from previous location where exception was thrown ---    at
> System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()    at
> System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task
> task)    at
> System.Runtime.CompilerServices.ConfiguredTaskAwaitable`1.ConfiguredTaskAwaiter.GetResult()
> at
> MongoDB.Driver.Core.Operations.FindOperation`1.<ExecuteAsync>d__107.MoveNext()
> --- End of stack trace from previous location where exception was thrown ---    at
> System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()    at
> System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task
> task)    at
> MongoDB.Driver.OperationExecutor.<ExecuteReadOperationAsync>d__1`1.MoveNext()
> --- End of stack trace from previous location where exception was thrown ---    at
> System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()    at
> System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task
> task)    at
> MongoDB.Driver.MongoCollectionImpl`1.<ExecuteReadOperationAsync>d__59`1.MoveNext()
> --- End of stack trace from previous location where exception was thrown ---    at
> System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()    at
> System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task
> task)    at
> System.Runtime.CompilerServices.ConfiguredTaskAwaitable`1.ConfiguredTaskAwaiter.GetResult()
> at
> MongoDB.Driver.IAsyncCursorSourceExtensions.<ToListAsync>d__16`1.MoveNext()
> --- End of stack trace from previous location where exception was thrown ---    at
> System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()    at
> System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task
> task)    at
> Liberty.LogService.Implementation.LogManagerService.<Inicializar>d__0.MoveNext()

This error is not constant, happen sometimes with no reason. But service B keeps working and if a try to connect to mongo from my desktop I can do it, so how can mongo driver raise an exception related to connectivity problems if the server is totally accessible?

The last attempt was migrated to last driver version. When this problems starts I was using driver 2.0.1

I appreciate any help

like image 701
Ricardo Falcades Avatar asked Oct 30 '22 22:10

Ricardo Falcades


1 Answers

This is a very tricky problem related with the Task Library. In short, there are too many tasks created and scheduled so that one of the task MongoDB's driver are waiting for is not going to be able to be finished. I took a very long time to realise it is not a deadlock although it looks like it is.

Here is the step to reproduce:

  1. Download the source code of the MongoDB's CSharp driver.
  2. Open that solution and create a console project inside and referencing the driver project.
  3. In the Main function, create a System.Threading.Timer which will call TestTask on time. Set the timer to start immediately once. At the end, add a Console.Read().
  4. In the TestTask, use a for loop to create 300 tasks by calling Task.Factory.StartNew(DoOneThing). Add all those tasks to a list and use Task.WaitAll to wait all of them to be finished.
  5. In the DoOneThing function, create a MongoClient and do some simple query.
  6. Now run it.

This will fail at the same place you mentioned: MongoDB.Driver.Core.Clusters.Cluster.WaitForDescriptionChangedHelper.HandleCompletedTask(Task completedTask)

If you put some break points, you will know that the WaitForDescriptionChangedHelper created a timeout task. It then waits for any one of the DescriptionUpdate task or the timeout task to complete. However, the DescriptionUpdate never happens, but why?

Now, back to my example, there is one interesting part: I started a timer. If you call the TestTask directly, it will run without any problem. By comparing them with the Visual Studio's Tasks window, you will noticed that the timer version will create a lot more tasks than the non-timer version. Let me explain this part a bit later. There is another important difference. You need to add debug lines in the Cluster.cs:

    protected void UpdateClusterDescription(ClusterDescription newClusterDescription)
    {
        ClusterDescription oldClusterDescription = null;
        TaskCompletionSource<bool> oldDescriptionChangedTaskCompletionSource = null;

        Console.WriteLine($"Before UpdateClusterDescription {_descriptionChangedTaskCompletionSource?.Task.Id}, {_descriptionChangedTaskCompletionSource?.Task?.GetHashCode().ToString("F8")}");
        lock (_descriptionLock)
        {
            oldClusterDescription = _description;
            _description = newClusterDescription;

            oldDescriptionChangedTaskCompletionSource = _descriptionChangedTaskCompletionSource;
            _descriptionChangedTaskCompletionSource = new TaskCompletionSource<bool>();
        }

        OnDescriptionChanged(oldClusterDescription, newClusterDescription);
        Console.WriteLine($"Setting UpdateClusterDescription {oldDescriptionChangedTaskCompletionSource?.Task.Id}, {oldDescriptionChangedTaskCompletionSource?.Task?.GetHashCode().ToString("F8")}");
        oldDescriptionChangedTaskCompletionSource.TrySetResult(true);
        Console.WriteLine($"Set UpdateClusterDescription {oldDescriptionChangedTaskCompletionSource?.Task.Id}, {oldDescriptionChangedTaskCompletionSource?.Task?.GetHashCode().ToString("F8")}");
    }

    private void WaitForDescriptionChanged(IServerSelector selector, ClusterDescription description, Task descriptionChangedTask, TimeSpan timeout, CancellationToken cancellationToken)
    {
        using (var helper = new WaitForDescriptionChangedHelper(this, selector, description, descriptionChangedTask, timeout, cancellationToken))
        {
            Console.WriteLine($"Waiting {descriptionChangedTask?.Id}, {descriptionChangedTask?.GetHashCode().ToString("F8")}");
            var index = Task.WaitAny(helper.Tasks);
            helper.HandleCompletedTask(helper.Tasks[index]);
        }
    }

By adding these lines, you will also find out that the non-timer version will update twice but the timer version will only update once. And the second one comes from the "MonitorServerAsync" in ServerMonitor.cs. It turned out that, in the timer version, the MontiorServerAsync was executed but after it comes all the way through ServerMonitor.HeartbeatAsync, BinaryConnection.OpenAsync, BinaryConnection.OpenHelperAsync and TcpStreamFactory.CreateStreamAsync, it finally reached TcpStreamFactory.ResolveEndPointsAsync. The bad thing happens here: Dns.GetHostAddressesAsync. This one never get executed. If you slightly modify the code and turn that into:

    var task = Dns.GetHostAddressesAsync(dnsInitial.Host).ConfigureAwait(false);

    return (await task)
        .Select(x => new IPEndPoint(x, dnsInitial.Port))
        .OrderBy(x => x, new PreferredAddressFamilyComparer(preferred))
        .ToArray();

You will be able to find the task id. By looking into the Visual Studio's Tasks window, it is quite obvious that there are around 300 tasks in front of it. Only several of them are executing but blocked. If you add a Console.Writeline in the DoOneThing function, you will see that the task scheduler start several of them almost at the same time but then, it slows down to around one per second. So, this means, you need to wait for around 300 seconds before the task of resolving the dns starts to run. That why it exceeds the 30 seconds timeout.

Now, here comes a quick solution if you are not doing crazy things:

Task.Factory.StartNew(DoOneThing, TaskCreationOptions.LongRunning);

This will force the ThreadPoolScheduler to start a thread immediately instead of waiting one second before creating a new one.

However, this will not work if you are doing really crazy thing like me. Let's change the for loop from 300 to 30000, even this solution might also fail. The reason is that it creates too many threads. This is resource and time consuming. And it might start kicking off the GC process. All together, it might not be able to finish creating all those threads before the time is running out.

The perfect way is to stop creating lots of tasks and use the default scheduler to schedule them. You can try creating work item and put it into a ConcurrentQueue and then create several threads as the workers to consume the items.

However, if you don't want to change the original structure too much, you can try the following way:

Create a ThrottledTaskScheduler derived from TaskScheduler.

  1. This ThrottledTaskScheduler accepts a TaskScheduler as the underlying one which will run the actual task.
  2. Dump the tasks to the underlying scheduler but if it exceed the limit, put it into a queue instead.
  3. If any of the task finished, check the queue and try to dump them in to the underlying scheduler within the limit.
  4. Use the following code to start all those crazy new tasks:

·

var taskScheduler = new ThrottledTaskScheduler(
    TaskScheduler.Default,
    128,
    TaskCreationOptions.LongRunning | TaskCreationOptions.HideScheduler,
    logger
    );
var taskFactory = new TaskFactory(taskScheduler);
for (var i = 0; i < 30000; i++)
{
    tasks.Add(taskFactory.StartNew(DoOneThing))
}
Task.WaitAll(tasks.ToArray());

You can take System.Threading.Tasks.ConcurrentExclusiveSchedulerPair.ConcurrentExclusiveTaskScheduler as a reference. It's a bit more complicated than the one we need. It's for some other purpose. So, don't worry about those parts which goes back and forth with the function inside the ConcurrentExclusiveSchedulerPair class. However, you cannot use it directly as it doesn't pass the TaskCreationOptions.LongRunning when it's creating the wrapping task.

It works for me. Good luck!

P.S.: The reason for having lots tasks in the timer version is probably laying inside the TaskScheduler.TryExecuteTaskInline. If it is in the main thread where the ThreadPool is created, it will be able to execute some of the tasks without putting them into the queue.

like image 130
Sumtec Avatar answered Nov 15 '22 07:11

Sumtec