I had some fun comparing the speed of the removeAll(Collection<?> c)
call declared in Collection
. Now I know that micro-benchmarks are difficult to do right, and I won’t look at a few milliseconds difference, but I believe my results to be valid, since I ran them repeatedly and they are very reproducible.
Let’s assume I have two collections that are not too tiny, say 100,000 consecutive integer elements, and also that they mostly overlap, for instance 5,000 are in the left but not the right. Now I simply call:
left.removeAll(right);
Of course this all depends on the types of both the left and the right collection. It’s blazingly fast if the right collection is a hash map, because that’s where the look-ups are done. But looking closer, I noticed two results that I cannot explain. I tried all the tests both with an ArrayList
that is sorted and with another that is shuffled (using Collections.shuffle()
, if that is of importance).
The first weird result is:
00293 025% shuffled ArrayList, HashSet
00090 008% sorted ArrayList, HashSet
Now either removing elements from the sorted ArrayList
is faster than removing from the shuffled list, or looking up consecutive values from the HashSet
is faster that looking up random values.
Now the other one:
02311 011% sorted ArrayList, shuffled ArrayList
01401 006% sorted ArrayList, sorted ArrayList
Now this suggests that the lookup in the sorted ArrayList
(using a contains()
call for each element of the list to the left) is faster than in the shuffled list. Now that would be quite easy if we could make use of the fact that it is sorted and use a binary search, but I do not do that.
Both results are mysterious to me. I cannot explain them by looking at the code or with my data-structure knowledge. Does it have anything to do with processor cache access patterns? Is the JIT compiler optimizing stuff? But if so, which? I performed warming up and run the tests a few times in a row, but perhaps there is a fundamental problem with my benchmark?
The reason for the performance difference is the memory access pattern: accessing elements which are consecutive in memory is faster than doing a random memory access (due to memory pre-fetching, cpu caches etc.)
When you initially populate the collection you create all the elements sequentially in the memory, so when you are traversing it (foreach, removeAll, etc) you are accessing consecutive memory regions which is cache friendly. When you shuffle the collection - the elements remain in the same order in memory, but the pointers to those elements are no longer in the same order, so when you are traversing the collection you'll be accessing for instance the 10th, the 1st, then the 5th element which is very cache unfriendly and ruins the performance.
You can look at this question where this effect is visible in greater detail: Why filtering an unsorted list is faster than filtering a sorted list
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With