Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why can't IEnumerator's be cloned?

In implementing a basic Scheme interpreter in C# I discovered, to my horror, the following problem:

IEnumerator doesn't have a clone method! (or more precisely, IEnumerable can't provide me with a "cloneable" enumerator).

What I'd like:

interface IEnumerator<T>
{
    bool MoveNext();
    T Current { get; }
    void Reset();
    // NEW!
    IEnumerator<T> Clone();
}

I cannot come up with an implementation of IEnumerable that would not be able to supply an efficiently cloneable IEnumerator (vectors, linked lists, etc. all would be able to provide a trivial implementation of IEnumerator's Clone() as specified above... it would be easier than providing a Reset() method anyway!).

The absence of the Clone method means that any functional/recursive idiom of enumerating over a sequence won't work.

It also means I can't "seamlessly" make IEnumerable's behave like Lisp "lists" (for which you use car/cdr to enumerate recursively). i.e. the only implemention of "(cdr some IEnumerable)" would be woefully inefficient.

Can anyone suggest a realistic, useful, example of an IEnumerable object that wouldn't be able to provide an efficient "Clone()" method? Is it that there'd be a problem with the "yield" construct?

Can anyone suggest a workaround?

like image 813
Paul Hollingsworth Avatar asked Apr 30 '09 16:04

Paul Hollingsworth


4 Answers

The purpose of "clonable" enumerators is mainly to be able save iteration position and be able to return to it later. That means, the iterated container must provide more rich interface than just IEnumerable. It is actually something between IEnumerable and IList. Working with IList you can just use integer index as enumerator, or create a simple immutable wrapping class, holding a reference to the list and current position.

If your container does not support random access and can only be iterated forward (like one-directional linked list), it must at least provide ability to get next element, having a reference to the previous one or to some "iteration state" that you can hold in your iterator. So, the interface can look like this:

interface IIterable<T>
{
    IIterator<T> GetIterator(); // returns an iterator positioned at start
    IIterator<T> GetNext(IIterator<T> prev); // returns an iterator positioned at the next element from the given one
}

interface IIterator<T>
{
    T Current { get; }
    IEnumerable<T> AllRest { get; }
}

Note that the iterator is immutable, it can not be "moved forward", we only can ask our iterable container to give us a new iterator pointing to the next position. The benefit of that is that you can store iterators anywhere as long as you need, for example have a stack of iterators and return to previously saved position when you need. You can save current position for later use by assigning to a variable, just as you would do with an integer index.

The AllRest property can be useful if you need to iterate from the given position to the end of container using standard language iteration features, like foraech or LinQ. It won't change the iterator position (remember, our iterator is immutable). The implementation can repeatedly GetNext and yleid return.

The GetNext method can actually be a part of iterator itself, like that:

interface IIterable<T>
{
    IIterator<T> GetIterator(); // returns an iterator positioned at start
}

interface IIterator<T>
{
    T Current { get; }
    IIterator<T> GetNext { get; } // returns an iterator positioned at the next element from the given one
    IEnumerable<T> AllRest { get; }
}

This is pretty much the same. The logic of determining the next state is just moved from the container implementation to the iterator implementation. Note that the iterator is still immutable. You can not "move it forward", you only can get another one, pointing to the next element.

like image 35
C-F Avatar answered Sep 28 '22 00:09

C-F


If you can let the original enumerator go, ie. not use it any more, you can implement a "clone" function that takes the original enumerator, and uses it as the source for one or more enumerators.

In other words, you could build something like this:

IEnumerable<String> original = GetOriginalEnumerable();
IEnumerator<String>[] newOnes = original.GetEnumerator().AlmostClone(2);
                                                         ^- extension method
                                                         produce 2
                                                         new enumerators

These could internally share the original enumerator, and a linked list, to keep track of the enumerated values.

This would allow for:

  • Infinite sequences, as long as both enumerators progress forward (the linked list would be written such that once both enumerators have passed a specific point, those can be GC'ed)
  • Lazy enumeration, the first of the two enumerators that need a value that hasn't been retrieved from the original enumerator yet, it would obtain it and store it into the linked list before yielding it

Problem here is of course that it would still require a lot of memory if one of the enumerators move far ahead of the other one.

Here is the source code. If you use Subversion, you can download the Visual Studio 2008 solution file with a class library with the code below, as well as a separate unit test porject.

Repository: http://vkarlsen.serveftp.com:81/svnStackOverflow/SO847655
Username and password is both 'guest', without the quotes.

Note that this code is not thread-safe, at all.

public static class EnumeratorExtensions
{
    /// <summary>
    /// "Clones" the specified <see cref="IEnumerator{T}"/> by wrapping it inside N new
    /// <see cref="IEnumerator{T}"/> instances, each can be advanced separately.
    /// See remarks for more information.
    /// </summary>
    /// <typeparam name="T">
    /// The type of elements the <paramref name="enumerator"/> produces.
    /// </typeparam>
    /// <param name="enumerator">
    /// The <see cref="IEnumerator{T}"/> to "clone".
    /// </param>
    /// <param name="clones">
    /// The number of "clones" to produce.
    /// </param>
    /// <returns>
    /// An array of "cloned" <see cref="IEnumerator[T}"/> instances.
    /// </returns>
    /// <remarks>
    /// <para>The cloning process works by producing N new <see cref="IEnumerator{T}"/> instances.</para>
    /// <para>Each <see cref="IEnumerator{T}"/> instance can be advanced separately, over the same
    /// items.</para>
    /// <para>The original <paramref name="enumerator"/> will be lazily evaluated on demand.</para>
    /// <para>If one enumerator advances far beyond the others, the items it has produced will be kept
    /// in memory until all cloned enumerators advanced past them, or they are disposed of.</para>
    /// </remarks>
    /// <exception cref="ArgumentNullException">
    /// <para><paramref name="enumerator"/> is <c>null</c>.</para>
    /// </exception>
    /// <exception cref="ArgumentOutOfRangeException">
    /// <para><paramref name="clones"/> is less than 2.</para>
    /// </exception>
    public static IEnumerator<T>[] Clone<T>(this IEnumerator<T> enumerator, Int32 clones)
    {
        #region Parameter Validation

        if (Object.ReferenceEquals(null, enumerator))
            throw new ArgumentNullException("enumerator");
        if (clones < 2)
            throw new ArgumentOutOfRangeException("clones");

        #endregion

        ClonedEnumerator<T>.EnumeratorWrapper wrapper = new ClonedEnumerator<T>.EnumeratorWrapper
        {
            Enumerator = enumerator,
            Clones = clones
        };
        ClonedEnumerator<T>.Node node = new ClonedEnumerator<T>.Node
        {
            Value = enumerator.Current,
            Next = null
        };

        IEnumerator<T>[] result = new IEnumerator<T>[clones];
        for (Int32 index = 0; index < clones; index++)
            result[index] = new ClonedEnumerator<T>(wrapper, node);
        return result;
    }
}

internal class ClonedEnumerator<T> : IEnumerator<T>, IDisposable
{
    public class EnumeratorWrapper
    {
        public Int32 Clones { get; set; }
        public IEnumerator<T> Enumerator { get; set; }
    }

    public class Node
    {
        public T Value { get; set; }
        public Node Next { get; set; }
    }

    private Node _Node;
    private EnumeratorWrapper _Enumerator;

    public ClonedEnumerator(EnumeratorWrapper enumerator, Node firstNode)
    {
        _Enumerator = enumerator;
        _Node = firstNode;
    }

    public void Dispose()
    {
        _Enumerator.Clones--;
        if (_Enumerator.Clones == 0)
        {
            _Enumerator.Enumerator.Dispose();
            _Enumerator.Enumerator = null;
        }
    }

    public T Current
    {
        get
        {
            return _Node.Value;
        }
    }

    Object System.Collections.IEnumerator.Current
    {
        get
        {
            return Current;
        }
    }

    public Boolean MoveNext()
    {
        if (_Node.Next != null)
        {
            _Node = _Node.Next;
            return true;
        }

        if (_Enumerator.Enumerator.MoveNext())
        {
            _Node.Next = new Node
            {
                Value = _Enumerator.Enumerator.Current,
                Next = null
            };
            _Node = _Node.Next;
            return true;
        }

        return false;
    }

    public void Reset()
    {
        throw new NotImplementedException();
    }
}
like image 37
Lasse V. Karlsen Avatar answered Sep 28 '22 00:09

Lasse V. Karlsen


As a workaround, you could easily make an extension method for IEnumerator which did your cloning. Just create a list from the enumerator, and use the elements as members.

You'd lose the streaming capabilities of an enumerator, though - since you're new "clone" would cause the first enumerator to fully evaluate.

like image 30
Reed Copsey Avatar answered Sep 28 '22 02:09

Reed Copsey


The logic is inexorable! IEnumerable doesn't support Clone, and you need Clone, so you shouldn't be using IEnumerable.

Or more accurately, you shouldn't be using it as the fundamental basis for work on a Scheme interpreter. Why not make a trivial immutable linked list instead?

public class Link<TValue>
{
    private readonly TValue value;
    private readonly Link<TValue> next;

    public Link(TValue value, Link<TValue> next)
    {
        this.value = value;
        this.next = next;
    } 

    public TValue Value 
    { 
        get { return value; }
    }

    public Link<TValue> Next 
    {
        get { return next; }
    }

    public IEnumerable<TValue> ToEnumerable()
    {
        for (Link<TValue> v = this; v != null; v = v.next)
            yield return v.value;
    }
}

Note that the ToEnumerable method gives you convenient usage in the standard C# way.

To answer your question:

Can anyone suggest a realistic, useful, example of an IEnumerable object that wouldn't be able to provide an efficient "Clone()" method? Is it that there'd be a problem with the "yield" construct?

An IEnumerable can go anywhere in the world for its data. Here's an example that reads lines from the console:

IEnumerable<string> GetConsoleLines()
{
    for (; ;)
        yield return Console.ReadLine();
}

There are two problems with this: firstly, a Clone function would not be particularly straightforward to write (and Reset would be meaningless). Secondly, the sequence is infinite - which is perfectly allowable. Sequences are lazy.

Another example:

IEnumerable<int> GetIntegers()
{
    for (int n = 0; ; n++)
        yield return n;
}

For both these examples, the "workaround" you've accepted would not be much use, because it would just exhaust the available memory or hang up forever. But these are perfectly valid examples of sequences.

To understand C# and F# sequences, you need to look at lists in Haskell, not lists in Scheme.

In case you think the infinite stuff is a red herring, how about reading the bytes from a socket:

IEnumerable<byte> GetSocketBytes(Socket s)
{
    byte[] buffer = new bytes[100];
    for (;;)
    {
        int r = s.Receive(buffer);
        if (r == 0)
            yield break;

        for (int n = 0; n < r; n++)
            yield return buffer[n];       
    }
}

If there is some number of bytes being sent down the socket, this will not be an infinite sequence. And yet writing Clone for it would be very difficult. How would the compiler generate the IEnumerable implementation to do it automatically?

As soon as there was a Clone created, both instances would now have to work from a buffer system that they shared. It's possible, but in practice it isn't needed - this isn't how these kinds of sequences are designed to be used. You treat them purely "functionally", like values, applying filters to them recursively, rather than "imperatively" remembering a location within the sequence. It's a little cleaner than low-level car/cdr manipulation.

Further question:

I wonder, what's the lowest level "primitive(s)" I would need such that anything I might want to do with an IEnumerable in my Scheme interpreter could be implemented in scheme rather than as a builtin.

The short answer I think would be to look in Abelson and Sussman and in particular the part about streams. IEnumerable is a stream, not a list. And they describe how you need special versions of map, filter, accumulate, etc. to work with them. They also get onto the idea of unifying lists and streams in section 4.2.

like image 73
Daniel Earwicker Avatar answered Sep 28 '22 02:09

Daniel Earwicker