Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does Java's ArrayList's remove function seem to cost so little?

People also ask

Does ArrayList remove reduce size?

There is no operation to decrease capacity of ArrayList's remove() method.

What happens when you remove from ArrayList?

If you call ArrayList. remove() the element at the given index is removed from the list (and all elements following it move down one index). The object itself still exists, no memory has been freed yet.

What is one main disadvantage of an ArrayList?

A possible disadvantage of ArrayList is that it holds only object types and not primitive types (eg, int ). To use a primitive type in an ArrayList, put it inside an object or use of the wrapper classes (eg, Integer, Double, Character, ...).

Why ArrayList is not good for manipulation?

2) Manipulation with ArrayList is slow because it internally uses an array. If any element is removed from the array, all the other elements are shifted in memory. Manipulation with LinkedList is faster than ArrayList because it uses a doubly linked list, so no bit shifting is required in memory.


I ran a benchmark, trying each of the following strategies for filtering the list elements:

  • Copy the wanted elements into a new list
  • Use Iterator.remove() to remove the unwanted elements from an ArrayList
  • Use Iterator.remove() to remove the unwanted elements from a LinkedList
  • Compact the list in-place (moving the wanted elements to lower positions)
  • Remove by index (List.remove(int)) on an ArrayList
  • Remove by index (List.remove(int)) on a LinkedList

Each time I populated the list with 100000 random instances of Point and used a filter condition (based on the hash code) that would accept 95% of elements and reject the remaining 5% (the same proportion stated in the question, but with a smaller list because I didn't have time to run the test for 250000 elements.)

And the average times (on my old MacBook Pro: Core 2 Duo, 2.2GHz, 3Gb RAM) were:

CopyIntoNewListWithIterator   :      4.24ms
CopyIntoNewListWithoutIterator:      3.57ms
FilterLinkedListInPlace       :      4.21ms
RandomRemoveByIndex           :    312.50ms
SequentialRemoveByIndex       :  33632.28ms
ShiftDown                     :      3.75ms

So removing elements by index from a LinkedList was more than 300 times more expensive than removing them from an ArrayList, and probably somewhere between 6000-10000 times more expensive than the other methods (that avoid linear search and arraycopy)

Here there doesn't seem to be much difference between the four faster methods, but I ran just those four again with a 500000-element list with the following results:

CopyIntoNewListWithIterator   :     92.49ms
CopyIntoNewListWithoutIterator:     71.77ms
FilterLinkedListInPlace       :     15.73ms
ShiftDown                     :     11.86ms

I'm guessing that with the larger size cache memory becomes the limiting factor, so the cost of creating a second copy of the list becomes significant.

Here's the code:

import java.awt.Point;
import java.security.SecureRandom;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collection;
import java.util.Iterator;
import java.util.LinkedList;
import java.util.List;
import java.util.Map;
import java.util.Random;
import java.util.TreeMap;

public class ListBenchmark {

    public static void main(String[] args) {
        Random rnd = new SecureRandom();
        Map<String, Long> timings = new TreeMap<String, Long>();
        for (int outerPass = 0; outerPass < 10; ++ outerPass) {
            List<FilterStrategy> strategies =
                Arrays.asList(new CopyIntoNewListWithIterator(),
                              new CopyIntoNewListWithoutIterator(),
                              new FilterLinkedListInPlace(),
                              new RandomRemoveByIndex(),
                              new SequentialRemoveByIndex(),
                              new ShiftDown());
            for (FilterStrategy strategy: strategies) {
                String strategyName = strategy.getClass().getSimpleName();
                for (int innerPass = 0; innerPass < 10; ++ innerPass) {
                    strategy.populate(rnd);
                    if (outerPass >= 5 && innerPass >= 5) {
                        Long totalTime = timings.get(strategyName);
                        if (totalTime == null) totalTime = 0L;
                        timings.put(strategyName, totalTime - System.currentTimeMillis());
                    }
                    Collection<Point> filtered = strategy.filter();
                    if (outerPass >= 5 && innerPass >= 5) {
                        Long totalTime = timings.get(strategyName);
                        timings.put(strategy.getClass().getSimpleName(), totalTime + System.currentTimeMillis());
                    }
                    CHECKSUM += filtered.hashCode();
                    System.err.printf("%-30s %d %d %d%n", strategy.getClass().getSimpleName(), outerPass, innerPass, filtered.size());
                    strategy.clear();
                }
            }
        }
        for (Map.Entry<String, Long> e: timings.entrySet()) {
            System.err.printf("%-30s: %9.2fms%n", e.getKey(), e.getValue() * (1.0/25.0));
        }
    }

    public static volatile int CHECKSUM = 0;

    static void populate(Collection<Point> dst, Random rnd) {
        for (int i = 0; i < INITIAL_SIZE; ++ i) {
            dst.add(new Point(rnd.nextInt(), rnd.nextInt()));
        }
    }

    static boolean wanted(Point p) {
        return p.hashCode() % 20 != 0;
    }

    static abstract class FilterStrategy {
        abstract void clear();
        abstract Collection<Point> filter();
        abstract void populate(Random rnd);
    }

    static final int INITIAL_SIZE = 100000;

    private static class CopyIntoNewListWithIterator extends FilterStrategy {
        public CopyIntoNewListWithIterator() {
            list = new ArrayList<Point>(INITIAL_SIZE);
        }
        @Override
        void clear() {
            list.clear();
        }
        @Override
        Collection<Point> filter() {
            ArrayList<Point> dst = new ArrayList<Point>(list.size());
            for (Point p: list) {
                if (wanted(p)) dst.add(p);
            }
            return dst;
        }
        @Override
        void populate(Random rnd) {
            ListBenchmark.populate(list, rnd);
        }
        private final ArrayList<Point> list;
    }

    private static class CopyIntoNewListWithoutIterator extends FilterStrategy {
        public CopyIntoNewListWithoutIterator() {
            list = new ArrayList<Point>(INITIAL_SIZE);
        }
        @Override
        void clear() {
            list.clear();
        }
        @Override
        Collection<Point> filter() {
            int inputSize = list.size();
            ArrayList<Point> dst = new ArrayList<Point>(inputSize);
            for (int i = 0; i < inputSize; ++ i) {
                Point p = list.get(i);
                if (wanted(p)) dst.add(p);
            }
            return dst;
        }
        @Override
        void populate(Random rnd) {
            ListBenchmark.populate(list, rnd);
        }
        private final ArrayList<Point> list;
    }

    private static class FilterLinkedListInPlace extends FilterStrategy {
        public String toString() {
            return getClass().getSimpleName();
        }
        FilterLinkedListInPlace() {
            list = new LinkedList<Point>();
        }
        @Override
        void clear() {
            list.clear();
        }
        @Override
        Collection<Point> filter() {
            for (Iterator<Point> it = list.iterator();
                 it.hasNext();
                 ) {
                Point p = it.next();
                if (! wanted(p)) it.remove();
            }
            return list;
        }
        @Override
        void populate(Random rnd) {
            ListBenchmark.populate(list, rnd);
        }
        private final LinkedList<Point> list;
    }

    private static class RandomRemoveByIndex extends FilterStrategy {
        public RandomRemoveByIndex() {
            list = new ArrayList<Point>(INITIAL_SIZE);
        }
        @Override
        void clear() {
            list.clear();
        }
        @Override
        Collection<Point> filter() {
            for (int i = 0; i < list.size();) {
                if (wanted(list.get(i))) {
                    ++ i;
                } else {
                    list.remove(i);
                }
            }
            return list;
        }
        @Override
        void populate(Random rnd) {
            ListBenchmark.populate(list, rnd);
        }
        private final ArrayList<Point> list;
    }

    private static class SequentialRemoveByIndex extends FilterStrategy {
        public SequentialRemoveByIndex() {
            list = new LinkedList<Point>();
        }
        @Override
        void clear() {
            list.clear();
        }
        @Override
        Collection<Point> filter() {
            for (int i = 0; i < list.size();) {
                if (wanted(list.get(i))) {
                    ++ i;
                } else {
                    list.remove(i);
                }
            }
            return list;
        }
        @Override
        void populate(Random rnd) {
            ListBenchmark.populate(list, rnd);
        }
        private final LinkedList<Point> list;
    }

    private static class ShiftDown extends FilterStrategy {
        public ShiftDown() {
            list = new ArrayList<Point>();
        }
        @Override
        void clear() {
            list.clear();
        }
        @Override
        Collection<Point> filter() {
            int inputSize = list.size();
            int outputSize = 0;
            for (int i = 0; i < inputSize; ++ i) {
                Point p = list.get(i);
                if (wanted(p)) {
                    list.set(outputSize++, p);
                }
            }
            list.subList(outputSize, inputSize).clear();
            return list;
        }
        @Override
        void populate(Random rnd) {
            ListBenchmark.populate(list, rnd);
        }
        private final ArrayList<Point> list;
    }

}

Array copy is a rather unexpensive operation. It is done on a very basic level (its a java native static method) and you are not yet in the range where the performance becomes really important.

In your example you copy approx 12000 times an array of size 150000 (on average). This does not take much time. I tested it here on my laptop and it took less than 500 ms.

Update I used the following code to measure on my laptop (Intel P8400)

import java.util.Random;

public class PerformanceArrayCopy {

    public static void main(String[] args) {

        int[] lengths = new int[] { 10000, 50000, 125000, 250000 };
        int[] loops = new int[] { 1000, 5000, 10000, 20000 };

        for (int length : lengths) {
            for (int loop : loops) {

                Object[] list1 = new Object[length];
                Object[] list2 = new Object[length];

                for (int k = 0; k < 100; k++) {
                    System.arraycopy(list1, 0, list2, 0, list1.length);
                }

                int[] len = new int[loop];
                int[] ofs = new int[loop];

                Random rnd = new Random();
                for (int k = 0; k < loop; k++) {
                    len[k] = rnd.nextInt(length);
                    ofs[k] = rnd.nextInt(length - len[k]);
                }

                long n = System.nanoTime();
                for (int k = 0; k < loop; k++) {
                    System.arraycopy(list1, ofs[k], list2, ofs[k], len[k]);
                }
                n = System.nanoTime() - n;
                System.out.print("length: " + length);
                System.out.print("\tloop: " + loop);
                System.out.print("\truntime [ms]: " + n / 1000000);
                System.out.println();
            }
        }
    }
}

Some results:

length: 10000   loop: 10000 runtime [ms]: 47
length: 50000   loop: 10000 runtime [ms]: 228
length: 125000  loop: 10000 runtime [ms]: 575
length: 250000  loop: 10000 runtime [ms]: 1198

I think the difference in performance is likely coming down to the difference that ArrayList supports random access where LinkedList does not.

If I want to get(1000) of an ArrayList I am specifying a specific index to access this, however LinkedList doesn't support this as it is organized through Node references.

If I call get(1000) of LinkedList, it will iterate the entire list until if finds index 1000 and this can be exorbitantly expensive if you have a large number of items in the LinkedList.


Interesting and unexpected results. This is just a hypothesis, but...

On average one of your array element removals will require moving half of your list (everything after it) back one element. If each item is a 64-bit pointer to an object (8 bytes), then this means copying 125000 items x 8 Bytes per pointer = 1 MB.

A modern CPU can copy a contiguous block of 1 MB of RAM to RAM pretty quickly.

Compared to looping over a linked list for every access, which requires comparisons and branching and other CPU unfriendly activities, the RAM copy is fast.

You should really try benchmarking the various operations independently and see how efficient they are with various list implementations. Share your results here if you do!