Locking on an interned string?

Question

Update: It is acceptable if this method is not thread safe, but I'm interested in learning how I would make it thread safe. Also, I do not want to lock on a single object for all values of key if I can avoid it.

Original Question: Suppose I want to write a higher order function that takes a key and a function, and checks if an object has been cached with the given key. If is has, the cached value is returned. Otherwise, the given function is run and the result is cached and returned.

Here's a simplified version of my code:

public static T CheckCache<T>(string key, Func<T> fn, DateTime expires)
{
    object cache = HttpContext.Current.Cache.Get(key);
    //clearly not thread safe, two threads could both evaluate the below condition as true
    //what can I lock on since the value of "key" may not be known at compile time?
    if (cache == null)
    {
        T result = fn();
        HttpContext.Current.Cache.Insert(key, result, null, expires, Cache.NoSlidingExpiration);
        return result;
    }
    else
        return (T)cache;
}

Also, suppose I do not know all possible values of key at compile time.

How can I make this thread safe? I know I need to introduce locking here, to prevent 1+ threads from evaluating my condition as true, but I don't know what to lock on. Many of the examples I've read about locking (such as Jon Skeet's article) recommend using a "dummy" private variable that's used only for locking. This isn't possible in this case, because keys are unknown at compile time. I know I could trivially make this thread safe by having the same lock be used for every key, but that could be wasteful.

Now, my main question is:

Is is possible to lock on key? Will string interning help here?

After reading .NET 2.0 string interning inside out, I understand that I can explicitly call String.Intern() to obtain a 1 to 1 mapping from the value of a string to instance of a string. Is this suitable to lock on? Let's change the above code to:

public static T CheckCache<T>(string key, Func<T> fn, DateTime expires)
{
    //check for the scenario where two strings with the same value are stored at different memory locations
    key = String.Intern(key); 
    lock (key) //is this object suitable for locking?
    {
        object cache = HttpContext.Current.Cache.Get(key);
        if (cache == null)
        {
            T result = fn();
            HttpContext.Current.Cache.Insert(key, result, null, expires, Cache.NoSlidingExpiration);
            return result;
        }
        else
            return (T)cache;
    }
}

Is the above implementation thread safe?

Evgeniy Berezovsky · Accepted Answer

Problems with @wsanville's own solution, partly mentioned before:

other parts of your code base might lock on the same interned string instances for different purposes, causing only performance issues, if lucky, and deadlocks if unlucky (potentially only in the future, as the code base grows, being extended by coders unaware of your String.Intern locking pattern) - note that this includes locks on the same interned string even if they are in different AppDomains, potentially leading to cross-AppDomain deadlocks
it's impossible for you to reclaim the interned memory in case you decided to do so
String.Intern() is slow

To address all these 3 issues, you could implement your own Intern() that you tie to your specific locking purpose, i.e. do not use it as a global, general-purpose string interner:

private static readonly ConcurrentDictionary<string, string> concSafe = 
    new ConcurrentDictionary<string, string>();
static string InternConcurrentSafe(string s)
{
    return concSafe.GetOrAdd(s, String.Copy);
}

I called this method ...Safe(), because when interning I will not store the passed in String instance, as that might e.g. be an already interned String, making it subject to the problems mentioned in 1. above.

To compare the performance of various ways of interning strings, I also tried the following 2 methods, as well as String.Intern.

private static readonly ConcurrentDictionary<string, string> conc = 
    new ConcurrentDictionary<string, string>();
static string InternConcurrent(string s)
{
    return conc.GetOrAdd(s, s);
}

private static readonly Dictionary<string, string> locked = 
    new Dictionary<string, string>(5000);
static string InternLocked(string s)
{
    string interned;
    lock (locked)
        if (!locked.TryGetValue(s, out interned))
            interned = locked[s] = s;
    return interned;
}

Benchmark

100 threads, each randomly selecting one of 5000 different strings (each containing 8 digits) 50000 times and then calling the respective intern method. All values after warming up sufficiently. This is Windows 7, 64bit, on a 4core i5.

N.B. Warming up the above setup implies that after warming up, there won't be any writes to the respective interning dictionaries, but only reads. It's what I was interested in for the use case at hand, but different write/read ratios will probably affect the results.

Results

String.Intern(): 2032 ms
InternLocked(): 1245 ms
InternConcurrent(): 458 ms
InternConcurrentSafe(): 453 ms

The fact that InternConcurrentSafe is as fast as InternConcurrent makes sense in light of the fact that these figures are after warming up (see above N.B.), so there are in fact no or only a few invocations of String.Copy during the test.

In order to properly encapsulate this, create a class like this:

public class StringLocker
{
    private readonly ConcurrentDictionary<string, string> _locks =
        new ConcurrentDictionary<string, string>();

    public string GetLockObject(string s)
    {
        return _locks.GetOrAdd(s, String.Copy);
    }
}

and after instantiating one StringLocker for every use case you might have, it is as easy as calling

lock(myStringLocker.GetLockObject(s))
{
    ...

N.B.

Thinking again, there's no need to return an object of type string if all you want to do is lock on it, so copying the characters is totally unnecessary, and the following would perform better than above class.

public class StringLocker
{
    private readonly ConcurrentDictionary<string, object> _locks =
        new ConcurrentDictionary<string, object>();

    public object GetLockObject(string s)
    {
        return _locks.GetOrAdd(s, k => new object());
    }
}

LukeH · Answer

A variant of Daniel's answer...

Rather than creating a new lock object for every single string you could share a small-ish set of locks, choosing which lock to use depending on the string's hashcode. This will mean less GC pressure if you potentially have thousands, or millions, of keys, and should allow enough granularity to avoid any serious blocking (perhaps after a few tweaks, if necessary).

public static T CheckCache<T>(string key, Func<T> fn, DateTime expires)
{
    object cached = HttpContext.Current.Cache[key];
    if (cached != null)
        return (T)cached;

    int stripeIndex = (key.GetHashCode() & 0x7FFFFFFF) % _stripes.Length;

    lock (_stripes[stripeIndex])
    {
        T result = fn();
        HttpContext.Current.Cache.Insert(key, result, null, expires,
                                         Cache.NoSlidingExpiration);
        return result;
    }
}

// share a set of 32 locks
private static readonly object[] _stripes = Enumerable.Range(0, 32)
                                                      .Select(x => new object())
                                                      .ToArray();

This will allow you to tweak the locking granularity to suit your particular needs just by changing the number of elements in the _stripes array. (However, if you need close to one-lock-per-string granularity then you're better off going with Daniel's answer.)

Oded · Answer

Never lock on strings. In particular on those that are interned. See this blog entry on the danger of locking on interned strings.

Just create a new object and lock on that:

object myLock = new object();

Locking on an interned string?

Tags:

c#

thread-safety

locking

higher-order-functions

wsanville

3 Answers

Evgeniy Berezovsky

LukeH

Oded

Recent Activity

Donate For Us

Locking on an interned string?

Tags:

c#

thread-safety

locking

higher-order-functions

wsanville

3 Answers

Evgeniy Berezovsky

LukeH

Oded

Related questions

Recent Activity

Donate For Us