Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

When to cache Tasks?

I was watching The zen of async: Best practices for best performance and Stephen Toub started to talk about Task caching, where instead of caching the results of task jobs you cache the tasks themselves. As far as i understood starting a new task for every job is expensive and it should be minimized as much as possible. At around 28:00 he showed this method:

private static ConcurrentDictionary<string, string> s_urlToContents;  public static async Task<string> GetContentsAsync(string url) {     string contents;     if(!s_urlToContents.TryGetValue(url, out contents))     {         var response = await new HttpClient().GetAsync(url);         contents = response.EnsureSuccessStatusCode().Content.ReadAsString();         s_urlToContents.TryAdd(url, contents);     }     return contents; } 

Which at a first look looks like a good thought out method where you cache results, i didn't event think about caching the job of getting the contents.

And than he showed this method:

private static ConcurrentDictionary<string, Task<string>> s_urlToContents;  public static Task<string> GetContentsAsync(string url) {     Task<string> contents;     if(!s_urlToContents.TryGetValue(url, out contents))     {         contents = GetContentsAsync(url);         contents.ContinueWith(t => s_urlToContents.TryAdd(url, t); },         TaskContinuationOptions.OnlyOnRanToCompletion |         TaskContinuationOptions.ExecuteSynchronously, TaskScheduler.Default);     }     return contents; }  private static async Task<string> GetContentsAsync(string url) {     var response = await new HttpClient().GetAsync(url);     return response.EnsureSuccessStatusCode().Content.ReadAsString(); } 

I have trouble understanding how this actually helps more than just storing the results.

Does this mean that you're using less Tasks to get the data?

And also, how do we know when to cache tasks? As far as i understand if you're caching in the wrong place you just get a load of overhead and stress the system too much

like image 955
Nikola.Lukovic Avatar asked Mar 18 '16 12:03

Nikola.Lukovic


People also ask

What is task cache?

Task caching is an approach towards caching in which instead of caching the results of the execution of a Task, you cache the Tasks instances themselves. In doing so, you can reduce the overhead of expensive operations each time a new Task instance is started.

What is WinINet cache task?

The WinINet functions have simple, yet flexible, built-in caching support. Any data retrieved from the network is cached on the hard disk and retrieved for subsequent requests. The application can control the caching on each request. For http requests from the server, most headers received are also cached.

What is caching in programming?

In computing, a cache is a high-speed data storage layer which stores a subset of data, typically transient in nature, so that future requests for that data are served up faster than is possible by accessing the data's primary storage location.


2 Answers

I have trouble understanding how this actually helps more than just storing the results.

When a method is marked with the async modifier, the compiler will automatically transform the underlying method into a state-machine, as Stephan demonstrates in previous slides. This means that the use of the first method will always trigger a creation of a Task.

In the second example, notice Stephan removed the async modifier and the signature of the method is now public static Task<string> GetContentsAsync(string url). This now means that the responsibility of creating the Task is on the implementer of the method and not the compiler. By caching Task<string>, the only "penalty" of creating the Task (actually, two tasks, as ContinueWith will also create one) is when it's unavailable in the cache, and not foreach method call.

In this particular example, IMO, wasn't to re-use the network operation that is already ongoing when the first task executes, it was simply to reduce the amount of allocated Task objects.

how do we know when to cache tasks?

Think of caching a Task as if it were anything else, and this question can be viewed from a more broad perspective: When should I cache something? The answer to this question is broad, but I think the most common use case is when you have an expensive operation which is on the hotpath of your application. Should you always be caching tasks? definitely not. The overhead of the state-machine allocation is usually neglectable. If needed, profile your app, and then (and only then) think if caching would be of use in your particular use case.

like image 161
Yuval Itzchakov Avatar answered Sep 19 '22 18:09

Yuval Itzchakov


Let's assume you are talking to a remote service which takes the name of a city and returns its zip codes. The service is remote and under load so we are talking to a method with an asynchronous signature:

interface IZipCodeService {     Task<ICollection<ZipCode>> GetZipCodesAsync(string cityName); } 

Since the service needs a while for every request we would like to implement a local cache for it. Naturally the cache will also have an asynchronous signature maybe even implementing the same interface (see Facade pattern). A synchronous signature would break the best-practice of never calling asynchronous code synchronously with .Wait(), .Result or similar. At least the cache should leave that up to the caller.

So let's do a first iteration on this:

class ZipCodeCache : IZipCodeService {     private readonly IZipCodeService realService;     private readonly ConcurrentDictionary<string, ICollection<ZipCode>> zipCache = new ConcurrentDictionary<string, ICollection<ZipCode>>();      public ZipCodeCache(IZipCodeService realService)     {         this.realService = realService;     }      public Task<ICollection<ZipCode>> GetZipCodesAsync(string cityName)     {         ICollection<ZipCode> zipCodes;         if (zipCache.TryGetValue(cityName, out zipCodes))         {             // Already in cache. Returning cached value             return Task.FromResult(zipCodes);         }         return this.realService.GetZipCodesAsync(cityName).ContinueWith((task) =>         {             this.zipCache.TryAdd(cityName, task.Result);             return task.Result;         });     } } 

As you can see the cache does not cache Task objects but the returned values of ZipCode collections. But by doing so it has to construct a Task for every cache hit by calling Task.FromResult and I think that is exactly what Stephen Toub tries to avoid. A Task object comes with overhead especially for the garbage collector because you are not only creating garbage but also every Task has a Finalizer which needs to be considered by the runtime.

The only option to work around this is by caching the whole Task object:

class ZipCodeCache2 : IZipCodeService {     private readonly IZipCodeService realService;     private readonly ConcurrentDictionary<string, Task<ICollection<ZipCode>>> zipCache = new ConcurrentDictionary<string, Task<ICollection<ZipCode>>>();      public ZipCodeCache2(IZipCodeService realService)     {         this.realService = realService;     }      public Task<ICollection<ZipCode>> GetZipCodesAsync(string cityName)     {         Task<ICollection<ZipCode>> zipCodes;         if (zipCache.TryGetValue(cityName, out zipCodes))         {             return zipCodes;         }         return this.realService.GetZipCodesAsync(cityName).ContinueWith((task) =>         {             this.zipCache.TryAdd(cityName, task);             return task.Result;         });     } } 

As you can see the creation of Tasks by calling Task.FromResult is gone. Furthermore it is not possible to avoid this Task creation when using the async/await keywords because internally they will create a Task to return no matter what your code has cached. Something like:

    public async Task<ICollection<ZipCode>> GetZipCodesAsync(string cityName)     {         Task<ICollection<ZipCode>> zipCodes;         if (zipCache.TryGetValue(cityName, out zipCodes))         {             return zipCodes;         } 

will not compile.

Don't get confused by Stephen Toub's ContinueWith flags TaskContinuationOptions.OnlyOnRanToCompletion and TaskContinuationOptions.ExecuteSynchronously. They are (only) another performance optimization which is not related to the main objective of caching Tasks.

As with every cache you should consider some mechanism which clean the cache from time to time and remove entries which are too old or invalid. You could also implement a policy which limits the cache to n entries and trys to cache the items requested most by introducing some counting.

I did some benchmarking with and without caching of Tasks. You can find the code here http://pastebin.com/SEr2838A and the results look like this on my machine (w/ .NET4.6)

Caching ZipCodes: 00:00:04.6653104 Gen0: 3560 Gen1: 0 Gen2: 0 Caching Tasks: 00:00:03.9452951 Gen0: 1017 Gen1: 0 Gen2: 0 
like image 22
Thomas Zeman Avatar answered Sep 20 '22 18:09

Thomas Zeman