Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why ArrayList performance differs if it is referenced as List?

It was mentioned in the kjellkod's article that if we pass ArrayList in the method which receives List as an argument, then we lose in performance because ArrayList implements additional RandomAccess interface. Example from article:

// SLOWER: as shown in http://ideone.com/1wnF1
private static void linearInsertion(Integer[] intArray, List<Integer> list) {
[...]
int list_size = list.size();
for (int i = 0; i < list_size; i++) {
if (integer.compareTo(list.get(i)) >= 0) { // ... more code

// FASTER: as shown in http://ideone.com/JOJ05
private static void linearInsertion(Integer[] intArray, ArrayList<Integer> list) {
[...]
int list_size = list.size();
for (int i = 0; i < list_size; i++) {
if (integer.compareTo(list.get(i)) >= 0) { // ... more code

From reference:

Generic list algorithms are encouraged to check whether the given list is an instanceof this interface before applying an algorithm that would provide poor performance if it were applied to a sequential access list, and to alter their behavior if necessary to guarantee acceptable performance.

However, if we really pass ArrayList in the above methods and check list instanceof RandomAccess, it will be true in both cases. So, my first question is why the Java VM should interpret this as a sequential list in the first method?

I have modified tests from article to check this behavior on my machine. When test is ran on the ideone, it shows results similar to kjellkod's. But when I ran it locally, I got unexpected results, which are contrary to article explanation as well as to my understanding. It seems that in my case ArrayList as List iteration is 5-25% faster than referencing it as an ArrayList:

enter image description here

How can this difference be explained? Does it depend on architecture or number of processor cores? My working machine configuration is Windows 7 Professional x64, Intel Core i5-3470 (4 cores, 4 threads), 16 GB RAM.

like image 263
Roman Petrenko Avatar asked Jun 11 '13 11:06

Roman Petrenko


2 Answers

So, my first question is why the Java VM should interpret this as a sequential list in the first method?

The JVM has no concept of sequential or random access lists. (Apart from the marker interface) It is the developer of the implementation which recognises that ArrayList perform random access lookups in O(1) time instead of O(n) time.

Does it depend on architecture or number of processor cores?

No, you will see a difference between -client e.g. 32-bit Windows and -server e.g. any 64-bit JVM.


I suspect you ran the List test second. This is likely to be faster as the code is more warned up both in the JIT as well as the CPU cache. I suggest you perform each test at least three times, running your longest tests first and ignore the first run you do.

Note: contains() is O(n) for a List which is why your timings grow O(n^2) Obviously you wouldn't use a List if you wanted to ignore duplicates and looking at the behaviour of really inefficient code can be confusing as you are very susceptible to the subtleties of what gets optimised and what doesn't. You will get much more meaningful results from comparing code which is already reasonably optimal.

like image 58
Peter Lawrey Avatar answered Oct 08 '22 17:10

Peter Lawrey


Though the code is the same in both methods still theoretically there may be a difference because at JVM level invoking an interface method is different than invoking a class method. They are 2 different bytecode operations: invokeinterface and invokevirtual. See http://bobah.net/d4d/source-code/misc/invokevirtual-vs-invokeinterface-performance-benchmark

like image 30
Evgeniy Dorofeev Avatar answered Oct 08 '22 19:10

Evgeniy Dorofeev