Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

List with non-null elements ends up containing null. A synchronization issue?

First of all, sorry about the title -- I couldn't figure out one that was short and clear enough.

Here's the issue: I have a list List<MyClass> list to which I always add newly-created instances of MyClass, like this: list.Add(new MyClass()). I don't add elements any other way.

However, then I iterate over the list with foreach and find that there are some null entries. That is, the following code:

foreach (MyClass entry in list)
    if (entry == null)
         throw new Exception("null entry!");

will sometimes throw an exception. I should point out that the list.Add(new MyClass()) are performed from different threads running concurrently. The only thing I can think of to account for the null entries is the concurrent accesses. List<> isn't thread-safe, after all. Though I still find it strange that it ends up containing null entries, instead of just not offering any guarantees on ordering.

Can you think of any other reason?

Also, I don't care in which order the items are added, and I don't want the calling threads to block waiting to add their items. If synchronization is truly the issue, can you recommend a simple way to call the Add method asynchronously, i.e., create a delegate that takes care of that while my thread keeps running its code? I know I can create a delegate for Add and call BeginInvoke on it. Does that seem appropriate?

Thanks.


EDIT: A simple solution based on Kevin's suggestion:

public class AsynchronousList<T> : List<T> {

    private AddDelegate addDelegate;
    public delegate void AddDelegate(T item);

    public AsynchronousList() {
        addDelegate = new AddDelegate(this.AddBlocking);
    }

    public void AddAsynchronous(T item) {
        addDelegate.BeginInvoke(item, null, null);
    }

    private void AddBlocking(T item) {
        lock (this) {
            Add(item);
        }
    }
}

I only need to control Add operations and I just need this for debugging (it won't be in the final product), so I just wanted a quick fix.

Thanks everyone for your answers.

like image 535
Alix Avatar asked Apr 26 '10 13:04

Alix


3 Answers

List<T> can only support multiple readers concurrently. If you are going to use multiple threads to add to the list, you'll need to lock the object first. There is really no way around this, because without a lock you can still have someone reading from the list while another thread updates it (or multiple objects trying to update it concurrently also).

http://msdn.microsoft.com/en-us/library/6sh2ey19.aspx

Your best bet probably is to encapsulate the list in another object, and have that object handle the locking and unlocking actions on the internal list. That way you could make your new object's "Add" method asynchronous and let the calling objects go on their merry way. Any time you read from it though you'll most likely still have to wait on any other objects finishing their updates though.

like image 103
kemiller2002 Avatar answered Oct 07 '22 16:10

kemiller2002


The only thing I can think of to account for the null entries is the concurrent accesses. List<> isn't thread-safe, after all.

That's basically it. We are specifically told it's not thread-safe, so we shouldn't be surprised that concurrent access results in contract-breaking behaviour.

As to why this specific problem occurs, we can but speculate, since List<>'s private implementation is, well, private (I know we have Reflector and Shared Source - but in principle it is private). Suppose the implementation involves an array and a 'last populated index'. Suppose also that 'Add an item' looks like this:

  • Ensure the array is big enough for another item
  • last populated index <- last populated index + 1
  • array[last populated index] = incoming item

Now suppose there are two threads calling Add. If the interleaved sequence of operations ends up like this:

  • Thread A : last populated index <- last populated index + 1
  • Thread B : last populated index <- last populated index + 1
  • Thread A : array[last populated index] = incoming item
  • Thread B : array[last populated index] = incoming item

then not only will there be a null in the array, but also the item that thread A was trying to add won't be in the array at all!

Now, I don't know for sure how List<> does its stuff internally. I have half a memory that it is with an ArrayList, which internally uses this scheme; but in fact it doesn't matter. I suspect that any list mechanism that expects to be run non-concurrently can be made to break with concurrent access and a sufficiently 'unlucky' interleaving of operations. If we want thread-safety from an API that doesn't provide it, we have to do some work ourselves - or at least, we shouldn't be surprised if the API sometimes breaks its when we don't.

For your requirement of

I don't want the calling threads to block waiting to add their item

my first thought is a Multiple-Producer-Single-Consumer queue, wherein the threads wanting to add items are the producers, which dispatch items to the queue async, and there is a single consumer which takes items off the queue and adds them to the list with appropriate locking. My second thought is that this feels as if it would be heavier than this situation warrants, so I'll let it mull for a bit.

like image 32
AakashM Avatar answered Oct 07 '22 16:10

AakashM


If you're using .NET Framework 4, you might check out the new Concurrent Collections. When it comes to threading, it's better not to try to be clever, as it's extremely easy to get it wrong. Synchronization can impact performance, but the effects of getting threading wrong can also result in strange, infrequent errors that are a royal pain to track down.

If you're still using Framework 2 or 3.5 for this project, I recommend simply wrapping your calls to the list in a lock statement. If you're concerned about performance of Add (are you performing some long-running operation using the list somewhere else?) then you can always make a copy of the list within a lock and use that copy for your long-running operation outside the lock. Simply blocking on the Adds themselves shouldn't be a performance issue, unless you have a very large number of threads. If that's the case, you can try the Multiple-Producer-Single-Consumer queue that AakashM recommended.

like image 41
Dan Bryant Avatar answered Oct 07 '22 14:10

Dan Bryant