Consider the following code:
List<int> list = new List<int>();
IEnumerable<int> enumerable = list;
IEnumerator<int> enumerator = enumerable.GetEnumerator();
list.Add(1);
bool any = enumerator.MoveNext();
At runtime, the last line throws an:
InvalidOperationException: Collection was modified; enumeration operation may not execute.
I understand the need for IEnumerators
to throw 'Collection was modified' exceptions when the IEnumerable
changes, but I don't understand this:
Why does the IEnumerator
throw this exception on the first call of MoveNext()
? Since the IEnumerator
doesn't represent the state of the IEnumerable
until MoveNext()
is called for the first time, why can't it start tracking changes from the first MoveNext()
instead of from GetEnumerator()
?
Probably because the rule "an Enumerator is invalidated if the underlying collection is modified" is simpler than the rule "an Enumerator is invalidated if the underlying collection is modified after the first call to MoveNext". Or it's just the way it's implemented. Plus, it's just reasonable to assume that an Enumerator represents the state of the underlying collection at the moment the Enumerator was created, and relying on a different behavior is likely to be a source of bugs.
I feel like a quick recap of iterators is needed.
An iterator (IEnumerator and IEnumerable for C#) is used to access elements of a structure in an ordered manner without exposing the underlying representation. The consequence is that it allows you to have extemely generic functions such as the following.
void Iterator<T, V>(T collection, Action<V> actor) where T : IEnumerable<V>
{
foreach (V value in collection)
actor(value);
}
//Or the more verbose way
void Iterator<T, V>(T collection, Action<V> actor) where T : IEnumerable<V>
{
using (var iterator = collection.GetEnumerator())
{
while (iterator.MoveNext())
actor(iterator.Current);
}
}
//Or if you need to support non-generic collections (ArrayList, Queue, BitArray, etc)
void Iterator<T, V> (T collection, Action<V> actor) where T : IEnumerable
{
foreach (object value in collection)
actor((V)value);
}
There are trade-offs, as can be seen in the C# specification.
5.3.3.16 Foreach statements
foreach ( type identifier in expr ) embedded-statement
The definite assignment state of v at the beginning of expr is the same as the state of v at the beginning of stmt.
The definite assignment state of v on the control flow transfer to embedded-statement or to the end point of stmt is the same as the state of v at the end of expr.
Which simply means that values are read-only. Why are they read-only? It's simple. Since foreach
is such a high level statement, it can't and won't assume anything about the container you are iterating over. What if you were iterating over a binary tree and decided to randomly assign values inside the foreach statement. If foreach
didn't force read-only access then your binary tree would degenerate into a tree. The entire data structure would be in disarray.
But this wasn't your original question. You were modifying the collection before you even accessed the first element and an error was thrown. Why? For this, I dug into the List class using ILSpy. Here's a snippet of the List class
public class List<T> : IList<T>, ICollection<T>, IEnumerable<T>, IList, ICollection, IEnumerable
{
private int _version;
public struct Enumerator : IEnumerator<T>, IDisposable, IEnumerator
{
private List<T> list;
private int version;
private int index;
internal Enumerator(List<T> list)
{
this.list = list;
this.version = list._version;
this.index = 0;
}
/* All the implemented functions of IEnumerator<T> and IEnumerator will throw
a ThrowInvalidOperationException if (this.version != this.list._version) */
}
}
The enumerator is initialized with the parent list's "version" and a reference to the parent list. All iterating operations check to make sure that the initial version is equivalent to the referenced list's current version. If they are out of synch, the iterator is no longer valid. Why does the BCL do this? Why didn't the implementers check if the index of the enumerator was 0 (representing a new enumerator), and if it was, simply resynch the versions? I'm not sure. I can only hypothesize that the team wanted conformity amongst all the classes that implemented IEnumerable and they also wanted to keep it simple. Therefore a List's enumerator (and I believe most others) do not discriminate between elements as long as they are within range.
This is the root cause of your problem. If you absolutely must have this functionality, then you will have to implement your own iterator and you may end up having to implement your own List. In my opinion, way too much work to against the flow of the BCL.
Here's a quote from the GoF when designing an iterator that the BCL team probably followed:
It can be dangerous to modify an aggregate while you're traversing it. If elements are added or deleted from the aggregate, you might end up accessing an element twice or missing it completely. A simple solution is to copy the aggregate and traverse the copy, but that's too expensive to do in general
The BCL team most likely decided that it was too expensive in space-time complexity and manpower. And this philosophy is seen throughout C#. It is probably too expensive to allow modification of variables inside a foreach, too expensive to have List's Enumerator discriminate where it's in the list, and too expensive to cradle the user. Hopefully I've explained it well enough that one can seen the power and the constraint of iterators.
Reference:
What changes the "version" of a list and thus invalidates all current enumerators?
Add
AddRange
Clear
Insert
InsertRange
RemoveAll
RemoveAt
RemoveRange
Reverse
Sort
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With