Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Iterable collection that can be mutated during iteration

Is there a collection data structure in Java (and C# if you know) that can be iterated over, with the following properties:

  • The current element can be removed without affecting the current iterator (the rest of the already-started iterator's iterations).
  • New elements can be added, but will also not affect the current iterator—not be included as an iterated value while the current iterator's iteration is still going. In my case only one new element would be added per iteration, but none should be seen until a new iterator is fetched from the iterable.
  • The order of elements doesn’t matter.

In effect, there is an incoming list and an outgoing list of items. The incoming list is iterated and some are copied to a new list. Some new elements can be added to the new list during iteration. After the iteration is over, the old incoming list is replaced with the new outgoing list. This whole process is itself within a loop.

So, copying the elements to a newly constructed collection object each time seems inefficient compared to one that had these add/remove properties.

I was kind of thinking of some kind of queue that lets me preview the current item and then either dequeue it or not, and move onto the next item. And I can add more items to the head of the queue but won’t see them because I’m moving toward the end. A doubly linked list could have these properties, right?

If you really want to know what it’s for, it’s to soup up the second large code block in an answer of mine.

like image 239
ErikE Avatar asked Sep 03 '25 15:09

ErikE


2 Answers

In java there is the CopyOnWriteArrayList which does what you want: It makes a copy of the backing array every time you change anything. But that does mean any iteration is 'set in stone' once you begin iteration, and therfore you can remove/add to the underlying collection at will without affecting any running iterators whatsoever.

You can also build your own collection type that has this behaviour. It'd be a 3 liner:

public class ConstantIterationArrayList<T> extends ArrayList<T> {
    public Iterator<T> iterator() {
        return new ArrayList<T>(this).iterator();
    }
}

(The above makes a copy of the list and then gives you an iterator for the copy, thus conveniently ensuring any modification to this list has absolutely no effect on that iterator).

Here's the real problem with your question:

The above will make copies of the underlying data store from time to time (my snippet above does so every time you make an iterator. CopyOnWriteArrayList does so every time you call remove() or add()). The operation 'copy the underlying data store' takes O(n) time, as in, it takes twice as long for a list that is twice as large.

ArrayList in general has the property that the remove() operation, unless you are removing an element at or very close to the end of the list, is an O(n) operation: Removing an element from a list takes twice as long if the list is twice as large.

Fortunately, modern CPUs have sizable caches and can work exceedingly fast within a cache page. Which translates to: Despite the fact that copying data over feels inefficient, in practice, as long as the backing array fits within a page or so, it's much faster than data stores based on LinkedList semantics. We're talking about up to ~1000 elements give or take. (Note, in general, almost everything you do to a LinkedList is O(n) and where ArrayList tends to work well with modern CPU architecture, LinkedList tends to do very poorly. The point being: LinkedList is very rarely the right answer either!)

So, if you have no more than ~1000 items in this list, I'd go ahead with CopyOnWriteArrayList or the custom class I wrote for you above.

However, if you have more than that, ArrayList is not the right data store to use here. Even if you forget about your constant iteration needs for now; calling remove() on large array lists is a bad idea (unless removing very close to the end of the list). In this case I'd sketch out precisely which operations you need to do on this data type and exactly which ones really need to be fast, and once you have a complete list, try to find a collection type which fits your needs exactly, and in the (likely) case nothing specific exists that is a perfect match, make one yourself. Like above, when you have to roll your own data type its usually a good idea to let the bulk of the work be done by existing data types, so either extend an existing one, or encapsulate one.

like image 50
rzwitserloot Avatar answered Sep 05 '25 07:09

rzwitserloot


In C# this is easy enough to do with List<T> and for (...) rather than foreach (...):

using System;
using System.Collections.Generic;
using System.Linq;

namespace Demo
{
    static class Program
    {
        static void Main()
        {
            List<int> list = Enumerable.Range(1, 10).ToList();

            for (int i = 0; i < list.Count; ++i)
            {
                if ((list[i] % 3) == 0) // Remove multiples of 3.
                    list.RemoveAt(i--); // NOTE: Post-decrement i
                else if ((list[i] % 4) == 0) // At each multiple of 4, add (2*value+1)
                    list.Add(list[i] * 2 + 1);
                else
                    ; // Do nothing.
            }

            Console.WriteLine(string.Join(", ", list)); // Outputs 1, 2, 4, 5, 7, 8, 10, 17
        }
    }
}

The key here is to use an index rather than foreach, and to not change anything before the current index (which from reading your requirements, is not needed).

However if you do need to add or remove elements before the current index, then this approach doesn't work (or at least, it becomes a lot more complicated).

like image 35
Matthew Watson Avatar answered Sep 05 '25 05:09

Matthew Watson