Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How is it possible that "RemoveAll" in LINQ is much faster than iteration?

The following code:

List<Interval> intervals = new List<Interval>();
List<int> points = new List<int>();

//Initialization of the two lists
// [...]

foreach (var point in points)
{
    intervals.RemoveAll (x => x.Intersects (point));
}

is at least 100x faster than this when the lists are of size ~10000:

List<Interval> intervals = new List<Interval>();
List<int> points = new List<int>();

//Initialization of the two lists
// [...]

foreach (var point in points)
{
    for (int i = 0; i < intervals.Count;)
    {
        if (intervals[i].Intersects(point))
        {
            intervals.Remove(intervals[i]);
        }
        else
        {
            i++;
        }
    }
}

How is it possible? What is performed under the hood with "RemoveAll"? According to MSDN, "RemoveAll" performs a linear search and is therefore in O(n). So I would expect similar performance for both.

When replacing "Remove" by "RemoveAt", the iteration is much faster, comparable to "RemoveAll". But both "Remove" and "RemoveAt" have O(n) complexity, so why is the performance difference between them so big? Could it only be due to the fact that "Remove (item)" compares the list elements with "item" and "RemoveAt" doesn't perform any comparison?

like image 376
Brainless Avatar asked Sep 02 '15 11:09

Brainless


3 Answers

If you remove an item from a List<T>, all the items after it will be moved back one spot. So if you remove n items, a lot of items will be moved n times.
RemoveAll will only do the moving once, which you can see in the source for List<T>: source

Another thing is that Remove(T item) will search the entire List for the item, so that's another n operations.

Something that has nothing to do with your question, but I'd like to point out anyway:
If you use a for-loop to delete items from a List, it's a lot easier to start at the end:

for (int i = intervals.Count - 1; i >= 0; i--)
{
    if (intervals[i].Intersects(point))
    {
        intervals.RemoveAt(i);
    }
}

This way, you don't need that ugly else-clause

like image 127
Dennis_E Avatar answered Nov 05 '22 14:11

Dennis_E


RemoveAll can be done in O(n) by checking the condition for n elements and moving at most n elements.

Your loop is O(n^2), as each Remove needs to check up to n elements. And even if you change it to RemoveAt, it still needs to move up to n elements.

This might be the fastest solution: intervals.RemoveAll(x => points.Any(x.Intersects));

like image 30
Henrik Avatar answered Nov 05 '22 12:11

Henrik


List is an array, and removing one element from an array requires moving all elements after the one you're removing to the previous index, so a[i] is moved to a[i-1].

Doing this repeatedly requires multiple moves, even if more elements meet the removal criteria. RemoveAll may optimize this by moving the elements by more than 1 index at a time as it traverses the list and finds more elements that match the removal criteria.

like image 33
Alex Avatar answered Nov 05 '22 12:11

Alex