Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

HttpClient async requests not completing for large batch sent out in a loop

I think I've managed to make a test that shows this problem repeatably, at least on my system. This question relates to HttpClient being used for a bad endpoint (nonexistant endpoint, the target is down).

The problem is that the number of completed tasks falls short of the total, usually by about a few. I don't mind requests not working, but this just results in the app just hanging there when the results are awaited.

I get the following result form the test code below:

Elapsed: 237.2009884 seconds. Tasks in batch array: 8000 Completed Tasks : 7993

If i set batchsize to 8 instead of 8000, it completes. For 8000 it jams on the WhenAll .

I wonder if other people get the same result, if I am doing something wrong, and if this appears to be a bug.

using System;
using System.Diagnostics;
using System.Linq;
using System.Net.Http;
using System.Threading;
using System.Threading.Tasks;

namespace CustomArrayTesting
{

    /// <summary>
    /// Problem: a large batch of async http requests is done in a loop using HttpClient, and a few of them never complete
    /// </summary>
    class ProgramTestHttpClient
    {
        static readonly int batchSize = 8000; //large batch size brings about the problem

        static readonly Uri Target = new Uri("http://localhost:8080/BadAddress");

        static TimeSpan httpClientTimeout = TimeSpan.FromSeconds(3);  // short Timeout seems to bring about the problem.

        /// <summary>
        /// Sends off a bunch of async httpRequests using a loop, and then waits for the batch of requests to finish.
        /// I installed asp.net web api client libraries Nuget package.
        /// </summary>
        static void Main(String[] args)
        {
            httpClient.Timeout = httpClientTimeout; 

            stopWatch = new Stopwatch();
            stopWatch.Start();


            // this timer updates the screen with the number of completed tasks in the batch (See timerAction method bellow Main)
            TimerCallback _timerAction = timerAction;
            TimerCallback _resetTimer = ResetTimer;
            TimerCallback _timerCallback = _timerAction + _resetTimer;

            timer = new Timer(_timerCallback, null, TimeSpan.FromSeconds(1), Timeout.InfiniteTimeSpan);
            //

            for (int i = 0; i < batchSize; i++)
            {
                Task<HttpResponseMessage> _response = httpClient.PostAsJsonAsync<Object>(Target, new Object());//WatchRequestBody()

                Batch[i] = _response;
            }

            try
            {
                Task.WhenAll(Batch).Wait();
            }
            catch (Exception ex)
            {

            }

            timer.Dispose();
            timerAction(null);
            stopWatch.Stop();


            Console.WriteLine("Done");
            Console.ReadLine();
        }

        static readonly TimeSpan timerRepeat = TimeSpan.FromSeconds(1);

        static readonly HttpClient httpClient = new HttpClient();

        static Stopwatch stopWatch;

        static System.Threading.Timer timer;

        static readonly Task[] Batch = new Task[batchSize];

        static void timerAction(Object state)
        {
            Console.Clear();
            Console.WriteLine("Elapsed: {0} seconds.", stopWatch.Elapsed.TotalSeconds);
            var _tasks = from _task in Batch where _task != null select _task;
            int _tasksCount = _tasks.Count();

            var _completedTasks = from __task in _tasks where __task.IsCompleted select __task;
            int _completedTasksCount = _completedTasks.Count();

            Console.WriteLine("Tasks in batch array: {0}       Completed Tasks : {1} ", _tasksCount, _completedTasksCount);

        }

        static void ResetTimer(Object state)
        {
            timer.Change(timerRepeat, Timeout.InfiniteTimeSpan);
        }
    }
}

Sometimes it just crashes before finishing with an Access Violation unhandled exception. The call stack just says:

>   mscorlib.dll!System.Threading._IOCompletionCallback.PerformIOCompletionCallback(uint errorCode = 1225, uint numBytes = 0, System.Threading.NativeOverlapped* pOVERLAP = 0x08b38b98) 
    [Native to Managed Transition]  
    kernel32.dll!@BaseThreadInitThunk@12()  
    ntdll.dll!___RtlUserThreadStart@8()     
    ntdll.dll!__RtlUserThreadStart@8()  

Most of the time it doesn't crash but just never finishes waiting on the whenall. In any case the following first chance exceptions are thrown for each request:

A first chance exception of type 'System.Net.Sockets.SocketException' occurred in System.dll
A first chance exception of type 'System.Net.WebException' occurred in System.dll
A first chance exception of type 'System.AggregateException' occurred in mscorlib.dll
A first chance exception of type 'System.ObjectDisposedException' occurred in System.dll

I made the debugger stop on the Object disposed exception, and got this call stack:

>   System.dll!System.Net.Sockets.NetworkStream.UnsafeBeginWrite(byte[] buffer, int offset, int size, System.AsyncCallback callback, object state) + 0x136 bytes    
    System.dll!System.Net.PooledStream.UnsafeBeginWrite(byte[] buffer, int offset, int size, System.AsyncCallback callback, object state) + 0x19 bytes  
    System.dll!System.Net.ConnectStream.WriteHeaders(bool async = true) + 0x105 bytes   
    System.dll!System.Net.HttpWebRequest.EndSubmitRequest() + 0x8a bytes    
    System.dll!System.Net.HttpWebRequest.SetRequestSubmitDone(System.Net.ConnectStream submitStream) + 0x11d bytes  
    System.dll!System.Net.Connection.CompleteConnection(bool async, System.Net.HttpWebRequest request = {System.Net.HttpWebRequest}) + 0x16c bytes  
    System.dll!System.Net.Connection.CompleteConnectionWrapper(object request, object state) + 0x4e bytes   
    System.dll!System.Net.PooledStream.ConnectionCallback(object owningObject, System.Exception e, System.Net.Sockets.Socket socket, System.Net.IPAddress address) + 0xf0 bytes 
    System.dll!System.Net.ServicePoint.ConnectSocketCallback(System.IAsyncResult asyncResult) + 0xe6 bytes  
    System.dll!System.Net.LazyAsyncResult.Complete(System.IntPtr userToken) + 0x65 bytes    
    System.dll!System.Net.ContextAwareResult.Complete(System.IntPtr userToken) + 0x92 bytes 
    System.dll!System.Net.LazyAsyncResult.ProtectedInvokeCallback(object result, System.IntPtr userToken) + 0xa6 bytes  
    System.dll!System.Net.Sockets.BaseOverlappedAsyncResult.CompletionPortCallback(uint errorCode, uint numBytes, System.Threading.NativeOverlapped* nativeOverlapped) + 0x98 bytes 
    mscorlib.dll!System.Threading._IOCompletionCallback.PerformIOCompletionCallback(uint errorCode, uint numBytes, System.Threading.NativeOverlapped* pOVERLAP) + 0x6e bytes    
    [Native to Managed Transition]

The exception message was:

{"Cannot access a disposed object.\r\nObject name: 'System.Net.Sockets.NetworkStream'."}    System.Exception {System.ObjectDisposedException}

Notice the relationship to that unhandled access violation exception that I rarely see.

So, it seems that HttpClient is not robust for when the target is down. I am doing this on windows 7 32 by the way.

like image 391
Elliot Avatar asked Nov 08 '13 20:11

Elliot


3 Answers

I looked through the source of HttpClient using reflector. For the synchronously executed part of the operation (when it is kicked-off), there seems to be no timeout applied to the returned task, as far as I can see. There is some timeout implementation that calls Abort() on an HttpWebRequest object, but again they seem to have missed out any timeout cancellation or faulting of the returned task on this side of the async function. There maybe something on the callback side, but sometimes the callback is probably "going missing", leading to the returned Task never completing.

I posted a question asking how to add a timeout to any Task, and an answerer gave this very nice solution (here as an extension method):

public static Task<T> WithTimeout<T>(this Task<T> task, TimeSpan timeout)
{
    var delay = task.ContinueWith(t => t.Result
        , new CancellationTokenSource(timeout).Token);
    return Task.WhenAny(task, delay).Unwrap();
}

So, calling HttpClient like this should prevent any "Tasks gone bad" from never ending:

Task<HttpResponseMessage> _response = httpClient.PostAsJsonAsync<Object>(Target, new Object()).WithTimeout<HttpResponseMessage>(httpClient.Timeout);

A couple more things that I think made requests less likely to go missing: 1. Increasing the timeout from 3s to 30s made all the tasks finish in the program that I posted with this question. 2. Increasing the number of concurrent connections allowed using for example System.Net.ServicePointManager.DefaultConnectionLimit = 100;

like image 82
Elliot Avatar answered Nov 17 '22 14:11

Elliot


I came across this question when googling for solutions to a similar problem from WCF. That series of exceptions is exactly the same pattern I see. Eventually through a ton of investigation I found a bug in HttpWebRequest that HttpClient uses. The HttpWebRequest gets in a bad state and only sends the HTTP headers. It then sits waiting for a response which will never be sent.

I've raised a ticket with Microsoft Connect which can be found here: https://connect.microsoft.com/VisualStudio/feedback/details/1805955/async-post-httpwebrequest-hangs-when-a-socketexception-occurs-during-setsocketoption

The specifics are in the ticket but it requires an async POST call from the HttpWebRequest to a non-localhost machine. I've reproduced it on Windows 7 in .Net 4.5 and 4.6. The failed SetSocketOption call, which raises the SocketException, only fails on Windows 7 in testing.

For us the UseNagleAlgorithm setting causes the SetSocketOption call, but we can't avoid it as WCF turns off UseNagleAlgorithm and you can't stop it. In WCF it appears as a timed out call. Obviously this isn't great as we're spending 60s waiting for nothing.

like image 27
Laurence Avatar answered Nov 17 '22 15:11

Laurence


Your exception information is being lost in the WhenAll task. Instead of using that, try this:

Task aggregateTask = Task.Factory.ContinueWhenAll(
    Batch,
    TaskExtrasExtensions.PropagateExceptions,
    TaskContinuationOptions.ExecuteSynchronously);

aggregateTask.Wait();

This uses the PropagateExceptions extension method from the Parallel Extensions Extras sample code to ensure that exception information from the tasks in the batch operation are not lost:

/// <summary>Propagates any exceptions that occurred on the specified tasks.</summary>
/// <param name="tasks">The Task instances whose exceptions are to be propagated.</param>
public static void PropagateExceptions(this Task [] tasks)
{
    if (tasks == null) throw new ArgumentNullException("tasks");
    if (tasks.Any(t => t == null)) throw new ArgumentException("tasks");
    if (tasks.Any(t => !t.IsCompleted)) throw new InvalidOperationException("A task has not completed.");
    Task.WaitAll(tasks);
}
like image 36
Sam Harwell Avatar answered Nov 17 '22 15:11

Sam Harwell