I am working on a WebCrawler implementation but am facing a strange memory leak in ASP.NET Web API's HttpClient.
So the cut down version is here:
I found the problem and it is not HttpClient that is leaking. See my answer.
I have added dispose with no effect:
static void Main(string[] args) { int waiting = 0; const int MaxWaiting = 100; var httpClient = new HttpClient(); foreach (var link in File.ReadAllLines("links.txt")) { while (waiting>=MaxWaiting) { Thread.Sleep(1000); Console.WriteLine("Waiting ..."); } httpClient.GetAsync(link) .ContinueWith(t => { try { var httpResponseMessage = t.Result; if (httpResponseMessage.IsSuccessStatusCode) httpResponseMessage.Content.LoadIntoBufferAsync() .ContinueWith(t2=> { if(t2.IsFaulted) { httpResponseMessage.Dispose(); Console.ForegroundColor = ConsoleColor.Magenta; Console.WriteLine(t2.Exception); } else { httpResponseMessage.Content. ReadAsStringAsync() .ContinueWith(t3 => { Interlocked.Decrement(ref waiting); try { Console.ForegroundColor = ConsoleColor.White; Console.WriteLine(httpResponseMessage.RequestMessage.RequestUri); string s = t3.Result; } catch (Exception ex3) { Console.ForegroundColor = ConsoleColor.Yellow; Console.WriteLine(ex3); } httpResponseMessage.Dispose(); }); } } ); } catch(Exception e) { Interlocked.Decrement(ref waiting); Console.ForegroundColor = ConsoleColor.Red; Console.WriteLine(e); } } ); Interlocked.Increment(ref waiting); } Console.Read(); }
The file containing links is available here.
This results in constant rising of the memory. Memory analysis shows many bytes held possibly by the AsyncCallback. I have done many memory leak analysis before but this one seems to be at the HttpClient level.
I am using C# 4.0 so no async/await here so only TPL 4.0 is used.
The code above works but is not optimised and sometimes throws tantrum yet is enough to reproduce the effect. Point is I cannot find any point that could cause memory to be leaked.
OK, I got to the bottom of this. Thanks to @Tugberk, @Darrel and @youssef for spending time on this.
Basically the initial problem was I was spawning too many tasks. This started to take its toll so I had to cut back on this and have some state for making sure the number of concurrent tasks are limited. This is basically a big challenge for writing processes that have to use TPL to schedule the tasks. We can control threads in the thread pool but we also need to control the tasks we are creating so no level of async/await
will help this.
I managed to reproduce the leak only a couple of times with this code - other times after growing it would just suddenly drop. I know that there was a revamp of GC in 4.5 so perhaps the issue here is that GC did not kick in enough although I have been looking at perf counters on GC generation 0, 1 and 2 collections.
HttpClient
does NOT cause memory leak.I'm no good at defining memory issues but I gave it a try with the following code. It's in .NET 4.5 and uses async/await feature of C#, too. It seems to keep memory usage around 10 - 15 MB for the entire process (not sure if you see this a better memory usage though). But if you watch # Gen 0 Collections, # Gen 1 Collections and # Gen 2 Collections perf counters, they are pretty high with the below code.
If you remove the GC.Collect
calls below, it goes back and forth between 30MB - 50MB for entire process. The interesting part is that when I run your code on my 4 core machine, I don't see abnormal memory usage by the process either. I have .NET 4.5 installed on my machine and if you don't, the problem might be related to CLR internals of .NET 4.0 and I am sure that TPL has improved a lot on .NET 4.5 based on resource usage.
class Program { static void Main(string[] args) { ServicePointManager.DefaultConnectionLimit = 500; CrawlAsync().ContinueWith(task => Console.WriteLine("***DONE!")); Console.ReadLine(); } private static async Task CrawlAsync() { int numberOfCores = Environment.ProcessorCount; List<string> requestUris = File.ReadAllLines(@"C:\Users\Tugberk\Downloads\links.txt").ToList(); ConcurrentDictionary<int, Tuple<Task, HttpRequestMessage>> tasks = new ConcurrentDictionary<int, Tuple<Task, HttpRequestMessage>>(); List<HttpRequestMessage> requestsToDispose = new List<HttpRequestMessage>(); var httpClient = new HttpClient(); for (int i = 0; i < numberOfCores; i++) { string requestUri = requestUris.First(); var requestMessage = new HttpRequestMessage(HttpMethod.Get, requestUri); Task task = MakeCall(httpClient, requestMessage); tasks.AddOrUpdate(task.Id, Tuple.Create(task, requestMessage), (index, t) => t); requestUris.RemoveAt(0); } while (tasks.Values.Count > 0) { Task task = await Task.WhenAny(tasks.Values.Select(x => x.Item1)); Tuple<Task, HttpRequestMessage> removedTask; tasks.TryRemove(task.Id, out removedTask); removedTask.Item1.Dispose(); removedTask.Item2.Dispose(); if (requestUris.Count > 0) { var requestUri = requestUris.First(); var requestMessage = new HttpRequestMessage(HttpMethod.Get, requestUri); Task newTask = MakeCall(httpClient, requestMessage); tasks.AddOrUpdate(newTask.Id, Tuple.Create(newTask, requestMessage), (index, t) => t); requestUris.RemoveAt(0); } GC.Collect(0); GC.Collect(1); GC.Collect(2); } httpClient.Dispose(); } private static async Task MakeCall(HttpClient httpClient, HttpRequestMessage requestMessage) { Console.WriteLine("**Starting new request for {0}!", requestMessage.RequestUri); var response = await httpClient.SendAsync(requestMessage).ConfigureAwait(false); Console.WriteLine("**Request is completed for {0}! Status Code: {1}", requestMessage.RequestUri, response.StatusCode); using (response) { if (response.IsSuccessStatusCode){ using (response.Content) { Console.WriteLine("**Getting the HTML for {0}!", requestMessage.RequestUri); string html = await response.Content.ReadAsStringAsync().ConfigureAwait(false); Console.WriteLine("**Got the HTML for {0}! Legth: {1}", requestMessage.RequestUri, html.Length); } } else if (response.Content != null) { response.Content.Dispose(); } } } }
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With