Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Must IList be finite?

Must .NET's IList be finite? Suppose I write a class FibonacciList implementing IList<BigInteger>

  • The property Item[n] returns the nth Fibonacci number.
  • The property IsReadOnly returns true.
  • The methods IndexOf and Contains we can implement easily enough because the Fibonacci sequence is increasing - to test if the number m is Fibonacci, we need only to compute the finite sequence of Fibonacci numbers up to m.
  • The method GetEnumerator() doing the right thing

We've now implemented all the methods expected of read-only ILists except Count().

Is this cool, or an abuse of IList?


Fibonacci numbers get impractically big quickly (hence IList<BigInteger> above) . A bounded infinite sequence might be more sensible, it could implement IList<long> or IList<double>.

Addendum II: Fibonacci sequence may have been a bad example, because computing distant values is expensive - to find the nth value one has to compute all earlier values. Thus as Mošmondor said, one might as well make it an IEnumerable and use .ElementAt. However there exist other sequences where one can compute distant values quickly without computing earlier values. (Surprisingly the digits of pi are such a sequence). These sequences are more 'listy', they truly support random access.

Edit: No-one argues against infinite IEnumerables. How do they handle Count()?

like image 675
Colonel Panic Avatar asked Jul 11 '12 15:07

Colonel Panic


1 Answers

To most developers, IList and ICollection imply that you have a pre-evaluated, in-memory collection to work with. With IList specifically, there is an implicit contract of constant-time Add* and indexing operations. This is why LinkedList<T> does not implement IList<T>. I would consider a FibonacciList to be a violation of this implied contract.

Note the following paragraph from a recent MSDN Magazine article discussing the reasons for adding read-only collection interfaces to .NET 4.5:

IEnumerable<T> is sufficient for most scenarios that deal with collections of types, but sometimes you need more power than it provides:

  • Materialization: IEnumerable<T> does not allow you to express whether the collection is already available (“materialized”) or whether it’s computed every time you iterate over it (for example, if it represents a LINQ query). When an algorithm requires multiple iterations over the collection, this can result in performance degradation if computing the sequence is expensive; it can also cause subtle bugs because of identity mismatches when objects are being generated again on subsequent passes.

As others have pointed out, there is also the question of what you would return for .Count.

It's perfectly fine to use IEnumerable or IQueryable in for such collections of data, because there is an expectation that these types can be lazily evaluated.

Regarding Edit 1: .Count() is not implemented by the IEnumerable<T> interface: it is an extension method. As such, developers need to expect that it can take any amount of time, and they need to avoid calling it in cases where they don't actually need to know the number of items. For example, if you just want to know whether an IEnumerable<T> has any items, it's better to use .Any(). If you know that there's a maximum number of items you want to deal with, you can use .Take(). If a collection has more than int.MaxValue items in it, .Count() will encounter an operation overflow. So there are some workarounds that can help to reduce the danger associated with infinite sequences. Obviously if programmers haven't taken these possibilities into account, it can still cause problems, though.

Regarding Edit 2: If you're planning to implement your sequence in a way that indexing is constant-time, that addresses my main point pretty handily. Sixlettervariables's answer still holds true, though.

*Obviously there's more to this: Add is only expected to work if IList.IsFixedSize returns false. Modification is only possible if IsReadOnly returns false, etc. IList was a poorly-thought-out interface in the first place: a fact which may finally be remedied by the introduction of read-only collection interfaces in .NET 4.5.

Update

Having given this some additional thought, I've come to the personal opinion that IEnumerable<>s should not be infinite either. In addition to materializing methods like .ToList(), LINQ has several non-streaming operations like .OrderBy() which must consume the entire IEnumerable<> before the first result can be returned. Since so many methods assume IEnumerable<>s are safe to traverse in their entirety, it would be a violation of the Liskov Substitution Principle to produce an IEnumerable<> that is inherently unsafe to traverse indefinitely.

If you find that your application often requires segments of the Fibonacci sequence as IEnumerables, I'd suggest creating a method with a signature similar to Enumerable.Range(int, int), which allows the user to define a starting and ending index.

If you'd like to embark on a Gee-Whiz project, you could conceivably develop a Fibonacci-based IQueryable<> provider, where users could use a limited subset of LINQ query syntax, like so:

// LINQ to Fibonacci!
var fibQuery = from n in Fibonacci.Numbers // (returns an IQueryable<>)
               where n.Index > 5 && n.Value < 20000
               select n.Value;
var fibCount = fibQuery.Count();
var fibList = fibQuery.ToList();

Since your query provider would have the power to evaluate the where clauses as lambda expressions, you could have enough control to implement Count methods and .GetEnumerator() in a way as to ensure that the query is restrictive enough to produce a real answer, or throw an exception as soon as the method is called.

But this reeks of being clever, and would probably be a really bad idea for any real-life software.

like image 163
StriplingWarrior Avatar answered Nov 15 '22 15:11

StriplingWarrior