Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does Enumerable<T>.ToArray() use an intermediary Buffer when it can just call Count() first?

I was reading through a question asking Is it better to call ToList() or ToArray() in LINQ queries? and found myself wondering why Enumerable.ToArray() wouldn't first just call the Count() method to find the size of the collection instead of using the internal Buffer{T} class which dynamically resizes itself. Something like the following:

T[] ToArray<T>(IEnumerable<T> source)
{
    var count = source.Count();
    var array = new T[count];

    int index = 0;
    foreach (var item in source) array[index++] = item;
    return array;
}

I know that we can't understand what is going through the minds of the designers and implementers and I'm sure they're much smarter than myself. So the best way to ask this question is what's wrong with the approach shown above? It seems to be less memory allocation and still operates in O(n) time.

like image 316
Anthony Avatar asked Jan 13 '23 02:01

Anthony


2 Answers

First, the Buffer<T> class constructor also optimizes if the specified sequence can be casted to ICollection(like array or list) which has a Count property:

TElement[] array = null;
int num = 0;
ICollection<TElement> collection = source as ICollection<TElement>;
if (collection != null)
{
    num = collection.Count;
    if (num > 0)
    {
        array = new TElement[num];
        collection.CopyTo(array, 0);
    }
}
else
    // now we are going the long way ...

So if it's not a collection the query must be executed to get the total count. But using Enumerable.Count just to initialize the array correctly sized can be very expensive and - more important - could have dangerous side-effects. Hence it is unsafe.

Consider this simple File.ReadLines example:

var lines = File.ReadLines(path);
int count = lines.Count(); // executes the query which also disposes the underlying IO.TextReader 
var array = new string[count];
int index = 0;
foreach (string line in lines) array[index++] = line;

This will throw an ObjectDisposedException "Cannot read from a closed TextReader" since lines.Count() already executed the query and in the meantime the reader is disposed at foreach.

like image 196
Tim Schmelter Avatar answered Jan 15 '23 15:01

Tim Schmelter


The Buffer<T> class has an optimization for the case where the source sequence implements ICollection<T>:

internal Buffer(IEnumerable<TElement> source)
{
   int length = 0;
   TElement[] array = null;
   ICollection<TElement> collection = source as ICollection<TElement>;
   if (collection != null)
   {
      length = collection.Count;
      if (length > 0)
      {
         array = new TElement[length];
         collection.CopyTo(array, 0);
      }
   }
   else
   {
      ...

If the sequence doesn't implement ICollection<T>, the code cannot assume that it's safe to enumerate the sequence twice, so it falls back to resizing the array as required.

like image 23
Richard Deeming Avatar answered Jan 15 '23 16:01

Richard Deeming