How does java.util.Collections.contains() perform faster than a linear search?

Tags:

I've been fooling around with a bunch of different ways of searching collections, collections of collections, etc. Doing lots of stupid little tests to verify my understanding. Here is one which boggles me (source code further below).

In short, I am generating N random integers and adding them to a list. The list is NOT sorted. I then use Collections.contains() to look for a value in the list. I intentionally look for a value that I know won't be there, because I want to ensure that the entire list space is probed. I time this search.

I then do another linear search manually, iterating through each element of the list and checking if it matches my target. I also time this search.

On average, the second search takes 33% longer than the first one. By my logic, the first search must also be linear, because the list is unsorted. The only possibility I could think of (which I immediately discard) is that Java is making a sorted copy of my list just for the search, but (1) I did not authorize that usage of memory space and (2) I would think that would result in MUCH more significant time savings with such a large N.

So if both searches are linear, they should both take the same amount of time. Somehow the Collections class has optimized this search, but I can't figure out how. So... what am I missing?

import java.util.*;

public class ListSearch {

    public static void main(String[] args) {

        int N = 10000000; // number of ints to add to the list
        int high = 100; // upper limit for random int generation

        List<Integer> ints;
        int target = -1; // target will not be found, forces search of entire list space

        long start;
        long end;

        ints = new ArrayList<Integer>();
        start = System.currentTimeMillis();
        System.out.print("Generating new list... ");
        for (int i = 0; i < N; i++) {
            ints.add(((int) (Math.random() * high)) + 1);
        }
        end = System.currentTimeMillis();
        System.out.println("took "  + (end-start) + "ms.");
        start = System.currentTimeMillis();
        System.out.print("Searching list for target (method 1)... ");
        if (ints.contains(target)) {
            // nothing
        }
        end = System.currentTimeMillis();
        System.out.println(" Took "  + (end-start) + "ms.");

        System.out.println();

        ints = new ArrayList<Integer>();
        start = System.currentTimeMillis();
        System.out.print("Generating new list... ");
        for (int i = 0; i < N; i++) {
            ints.add(((int) (Math.random() * high)) + 1);
        }
        end = System.currentTimeMillis();
        System.out.println("took "  + (end-start) + "ms.");
        start = System.currentTimeMillis();
        System.out.print("Searching list for target (method 2)... ");
        for (Integer i : ints) {
            // nothing
        }
        end = System.currentTimeMillis();
        System.out.println(" Took "  + (end-start) + "ms.");
    }
}

EDIT: Below is a new version of this code. The interesting thing is that now my manual linear loop performs 16% faster than the contains method (NOTE: both are designed to intentionally search the entire list space, so I know they are equal in number of iterations). I can't account for this 16% gain... more confusion.

import java.util.*;

public class ListSearch {

    public static void main(String[] args) {

        int N = 10000000; // number of ints to add to the list
        int high = 100; // upper limit for random int generation

        List<Integer> ints;
        int target = -1; // target will not be found, forces search of entire list space

        long start;
        long end;

        ints = new ArrayList<Integer>();
        start = System.currentTimeMillis();
        System.out.print("Generating new list... ");
        for (int i = 0; i < N; i++) {
            ints.add(((int) (Math.random() * high)) + 1);
        }
        end = System.currentTimeMillis();
        System.out.println("took "  + (end-start) + "ms.");
        start = System.currentTimeMillis();
        System.out.print("Searching list for target (method 1)... ");
        if (ints.contains(target)) {
            System.out.println("hit");
        }
        end = System.currentTimeMillis();
        System.out.println(" Took "  + (end-start) + "ms.");

        System.out.println();

        ints = new ArrayList<Integer>();
        start = System.currentTimeMillis();
        System.out.print("Generating new list... ");
        for (int i = 0; i < N; i++) {
            ints.add(((int) (Math.random() * high)) + 1);
        }
        end = System.currentTimeMillis();
        System.out.println("took "  + (end-start) + "ms.");
        start = System.currentTimeMillis();
        System.out.print("Searching list for target (method 2)... ");
        for (int i = 0; i < N; i++) {
            if (ints.get(i) == target) {
                System.out.println("hit");
            }
        }
        end = System.currentTimeMillis();
        System.out.println(" Took "  + (end-start) + "ms.");
    }
}

874

asked Oct 20 '12 04:10

The111

3 Answers

Your comparison code is buggy, and this is distorting your results.

This does search for the target

    if (ints.contains(target)) {
        // nothing
    }

But this does not!

    for (Integer i : ints) {
        // nothing
    }

You are actually just iterating over the list elements without testing them.

Having said that, the second version is slower than the first version for one or more of the following reasons:

The first version will be iterating the backing array using a simple for loop and an index. The second version is equivalent to the following:
```
Iterator<Integer> it = ints.iterator();
while (it.hasNext()) {
    Integer i = (Integer) it.next();
}
```
In other words, each time around the loop involves 2 method calls and a typecast¹.
The first version will return true as soon as it gets a match. Because of the bug in your implementation, the second version will iterate the entire list every time. In fact, given the choice of N and high, this effect is the most likely the main cause of the difference in performance.

^{1 - Actually, it is not entirely clear what the JIT compiler will do with all of this. It could in theory inline the method calls, deduce that the typcaset is unnecessary, or even optimize away the entire loop. On the other hand, there are factors that may inhibit these optimizations. For instance, ints is declared as List<Integer>which could inhibit inlining ... unless the JIT is able to deduce that the actual type is always the same.}

Your results are possibly also distorted for another reason. Your code does not take account of JVM warmup. Read this Question for more details: How do I write a correct micro-benchmark in Java?

answered Oct 07 '22 23:10

Stephen C

Here is the difference:

When you use contains, it use the internal array of the object and does a search like this:

    for (int i = 0; i < size; i++)
        if (searchObject.equals(listObject[i]))
            return true;
    return false;

Here when it tries to get the ith element, it directly gets the ith element object form the internal array.

When you write your own, you are writing like:

    for (Integer i : ints) {
        // nothing
    }

its equivalent to :

   for(Iterator<Integer> iter= ints.iterator(); iter.hasNext(); ) {
         Integer i = iter.next();
   }

Which is performing much more steps than the contains.

answered Oct 07 '22 22:10

Yogendra Singh

So I'm not entirely sure you're testing anything. Javac (compiler) is smart enough to realize that you don't have any code inside your for loop and your if statement. In this case, Java will remove that code from it's compilation. The reason you might be getting time back is because you're actually counting the time it takes to print a string. System output time can vary drastically depending on what your system is doing. When writing timing tests, any I/O can create invalid tests.

First I would remove the string prints from inside your timing.

Second of all, ArrayList.contains is linear. It doesn't use the special for loop like you're doing. Your loop has some overhead of getting the iterator from the collection and then iterating over it. This is how the special for loop works behind the scenes.

Hope this helps.

answered Oct 07 '22 23:10

Kurtymckurt

Related questions
                            
                                Convert Latin-1 content of InputStream into UTF-8 String
                            
                                JGraphT examples
                            
                                JFreeChart domain/range axes defined
                            
                                Java 7 JColorChooser: Disable Transparency Slider
                            
                                Testing thread-safety in Java [duplicate]
                            
                                Flyway and JPA integration
                            
                                Have Eclipse automatically import classes in jsp files specifically
                            
                                Single/complete Maven dependencies for Java EE 5 API
                            
                                java.util package - classes vs interfaces
                            
                                java.lang.UnsupportedClassVersionError: myApp : Unsupported major.minor version 51.0 (unable to load class myApp) On Linux
                            
                                java number exceeds long.max_value - how to detect?
                            
                                Difference in JComboBox and JSpinner
                            
                                How to get the component that invoked a JPopupMenu?
                            
                                How to set a faultCode in a SOAPFault?
                            
                                Ant JUnit ClassNotFoundException
                            
                                GetSerializableExtra returning null
                            
                                Hibernate (JPA) multiple @OneToMany for same model
                            
                                Play! framework vs Ruby on Rails [closed]
                            
                                How can I create a new annotation?
                            
                                Find all instances of a class while debugging in Eclipse

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How does java.util.Collections.contains() perform faster than a linear search?

Tags:

java

collections

linear-search