Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

StackExchange.Redis Deadlocking

I'm using StackExchange.Redis (SE.R henceforth) in my Nancy application. There is a single global ConnectionMultiplexer that gets automatically passed around by Nancy's TinyIoC via constructor parameters, and any time I try and use GetDatabase and one of the *Async methods (sync methods only start failing after one of the async methods have been attempted) my application deadlocks.

Looking at my parallel stacks it appears that I have four threads:

  1. The thread that called Result on one of my Tasks that uses SE.R. (Has lots of Nancy stuff on the stack, then a call to my library that utilizes SE.R, and a call to Result. Top of the stack is Monitor.Wait).
  2. A thread that spawned two other threads. I assume this is managed by SE.R. Starts with Native to Managed Transition, ThreadHelper.ThreadStart, and at the top of the stack is ThreadHelper.ThreadStart_Context.
  3. A small stack that is stuck like this:
    • Monitor.Wait
    • Monitor.Wait
    • SocketManager.WriteAllQueues
    • SocketManager.cctor.AnonymousMethod__16
  4. Another small stack that looks like this:
    • Managed to Native Transition
    • SocketManager.ReadImpl
    • SocketManager.Read
    • SocketManager.cctor.AnonymousMethod__19

I'm almost sure this is a deadlock of some sort. I even think it might have something to do with this question. But I have no idea what to do about it.

The ConnectionMultiplexer is set up in a Nancy IRegistrations with the following code:

var configOpts =  new ConfigurationOptions {
  EndPoints = {
    RedisHost,
  },
  Password = RedisPass,
  AllowAdmin = false,
  ClientName = ApplicationName,
  ConnectTimeout = 10000,
  SyncTimeout = 5000,
};
var mux = ConnectionMultiplexer.Connect(configOpts);
yield return new InstanceRegistration(typeof (ConnectionMultiplexer), mux);

mux is the instance that gets shared by all code requesting it in their constructor parameter list.

I have a class called SchemaCache. A small piece of it (that includes the code that throws the error in question) follows:

public SchemaCache(ConnectionMultiplexer connectionMultiplexer) {
    ConnectionMultiplexer = connectionMultiplexer;
}

private ConnectionMultiplexer ConnectionMultiplexer { get; set; }

private async Task<string[]> Cached(string key, bool forceFetch, Func<string[]> fetch) {
    var db = ConnectionMultiplexer.GetDatabase();

    return forceFetch || !await db.KeyExistsAsync(key)
        ? await CacheSetSet(db, key, await Task.Run(fetch))
        : await CacheGetSet(db, key);
}

private static async Task<string[]> CacheSetSet(IDatabaseAsync db, string key, string[] values) {
    await db.KeyDeleteAsync(key);
    await db.SetAddAsync(key, EmptyCacheSentinel);

    var keysSaved = values
        .Append(EmptyCacheSentinel)
        .Select(val => db.SetAddAsync(key, val))
        .ToArray()
        .Append(db.KeyExpireAsync(key, TimeSpan.FromDays(1)));
    await Task.WhenAll(keysSaved);

    return values;
}

private static async Task<string[]> CacheGetSet(IDatabaseAsync db, string key) {
    var results = await db.SetMembersAsync(key);
    return results.Select(rv => (string) rv).Without(EmptyCacheSentinel).ToArray();
}

// There are a bunch of these public methods:
public async Task<IEnumerable<string>> UseCache1(bool forceFetch = false) {
    return await Cached("the_key_i_want", forceFetch, () => {
        using (var cnn = MakeConnectionToDatabase("server", "databaseName")) {
            // Uses Dapper:
            return cnn.Query<string>("--expensive sql query").ToArray();
        }
    });
}

I also have a class that makes uses of this in a method requiring some of the information from the cache:

public OtherClass(SchemaCache cache) {
    Cache = cache;
}

private SchemaCache Cache { get; set; }

public Result GetResult(Parameter parameter) {
    return Cache.UseCache1().Result
        .Where(r => Cache.UseCache2(r).Result.Contains(parameter))
        .Select(r => CheckResult(r))
        .FirstOrDefault(x => x != null);
}

All of the above works fine in LinqPad where there's only one instance of everything in question. Instead it fails with a TimeoutException (and later an exception about no available connection). The only difference is that I get an instance of the cache via dependency injection, and I'm pretty sure Nancy uses Tasks to parallelize the requests.

like image 296
Chris Pfohl Avatar asked Dec 01 '14 18:12

Chris Pfohl


1 Answers

Here we go:

return Cache.UseCache1().Result

boom; deadlock. But nothing to do with StackExchange.Redis.

At least, from most sync-context providers. This is because your awaits are all implicitly requesting sync-context activation - which can mean "on the UI thread" (winforms, WPF), or "on the currently designated worker thread" (WCF, ASP.NET, MVC, etc). The problem here is that this thread will never be available to process those items, because .Result is a synchronous and blocking call. Thus none of your completion callbacks will get processed, because the only thread that can possibly process them is waiting for them to finish before making itself available.

Note: StackExchange.Redis does not use the sync-context; it explicitly disconnects from sync-context to avoid being the cause of deadlocks (this is normal and recommended for libraries). The key point is that your code doesn't.

Options:

  • don't mix async and .Result / .Wait(), or
  • have all of your await calls (or at least those underneath .UseCache1()) explicitly call .ConfigureAwait(false) - note, however, that this means that the completions will not occur in the calling context!

The first option is the simplest; if you can isolate a tree of calls that do not depend on sync-context then the second approach is viable.

This is a very common problem; I did pretty much the same thing.

like image 190
Marc Gravell Avatar answered Sep 19 '22 16:09

Marc Gravell