Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java sort via Comparator<T> spends majority of its time in compare(Object,Object)

I noticed something odd while doing profiling our code base. It seemed that sorting with a typed comparator (e.g. Comparator<MyClass>) always first called a method Comparator<MyClass>.compare(Object,Object) which then called the method Comparator<MyClass>.compare(MyClass,MyClass). Furthermore, the vast majority of the time was spent in the Comparator<MyClass>.compare(Object,Object). To explore further, I made a little test program:

public class Sandbox {
    public static void main(String argv[]) {
        for(int j=0; j<100; j++) {
            int n = 10000;
            SortMe[] sortMes = new SortMe[n];
            for (int i=0; i<n; i++) {
                sortMes[i] = new SortMe(Math.random());
            }
            Arrays.sort(sortMes, new SortMeComp());
            System.out.println(Arrays.toString(sortMes));
        }
        for(int j=0; j<100; j++) {
            int n = 10000;
            SortMe[] sortMes = new SortMe[n];
            for (int i=0; i<n; i++) {
                sortMes[i] = new SortMe(Math.random());
            }
            Arrays.sort(sortMes, new SortMeCompTypeless());
            System.out.println(Arrays.toString(sortMes));
        }
    }
}

The typed Comparator:

public class SortMeComp implements Comparator<SortMe>{
    public int compare(SortMe one, SortMe two) {
        if(one.getValue()>two.getValue()) {
            return -1;
        } else if (one.getValue()<two.getValue()) {
            return 1;
        } else {
            return 0;
        }
    }
}

The untyped Comparator I made for comparison:

public class SortMeCompTypeless implements Comparator{
    public int compare(Object oneObj, Object twoObj) {
        SortMe one = (SortMe) oneObj;
        SortMe two = (SortMe) twoObj;
        if(one.getValue()>two.getValue()) {
            return -1;
        } else if (one.getValue()<two.getValue()) {
            return 1;
        } else {
            return 0;
        }
    }
}

Here are the results (from YourKit profiler; let me know if you'd rather have a screenshot):

+----------------------------------------------------+-----------------+-----------------+--------------------+
|                        Name                        |    Time (ms)    |  Own Time (ms)  |  Invocation Count  |
+----------------------------------------------------+-----------------+-----------------+--------------------+
|  +---java.util.Arrays.sort(Object[], Comparator)   |  23,604  100 %  |          8,096  |               200  |
|    |                                               |                 |                 |                    |
|    +---SortMeComp.compare(Object, Object)          |  11,395   48 %  |          7,430  |        12,352,936  |
|    | |                                             |                 |                 |                    |
|    | +---SortMeComp.compare(SortMe, SortMe)        |   3,965   17 %  |          3,965  |        12,352,936  |
|    |                                               |                 |                 |                    |
|    +---SortMeCompTypeless.compare(Object, Object)  |   4,113   17 %  |          4,113  |        12,354,388  |
+----------------------------------------------------+-----------------+-----------------+--------------------+

I ran the profile without filtering, and you see the recursive calls to mergeSort (which just make it hard to read), but nothing of interest.

So what's going on here? Where is that method SortMeComp.compare(Object,Object) coming from? We figured it was something that Java creates internally for dealing with generics, but what could be taking so long? I would think the jvm would just treat a generic method like an "untyped"/Object method. As you can see, a simple cast is far quicker. Besides which, I would think this is exactly the kind of thing that would be JITed away even if the jvm needed to do upfront type stuff. What's going on here?

By the way:

$ java -version
java version "1.6.0_26"
Java(TM) SE Runtime Environment (build 1.6.0_26-b03)
Java HotSpot(TM) 64-Bit Server VM (build 20.1-b02, mixed mode)

Edit:

In response to savinos answer, I tried simulating the extra method call with an 'untyped' Comparator that simply cast to a typed compare:

public class SortMeCompMethodCalls implements Comparator{
    public int compare(Object oneObj, Object twoObj) {
        return compare((SortMe)oneObj, (SortMe)twoObj);
    }
    public int compare(SortMe one, SortMe two) {
        if(one.getValue()>two.getValue()) {
            return -1;
        } else if (one.getValue()<two.getValue()) {
            return 1;
        } else {
            return 0;
        }
    }
}

Here are the results:

+---------------------------------------------------------+-----------------+-----------------+--------------------+
|                          Name                           |    Time (ms)    |  Own Time (ms)  |  Invocation Count  |
+---------------------------------------------------------+-----------------+-----------------+--------------------+
|  +---java.util.Arrays.sort(Object[], Comparator)        |  31,044  100 %  |          8,061  |               200  |
|    |                                                    |                 |                 |                    |
|    +---SortMeComp.compare(Object, Object)               |  11,554   37 %  |          7,617  |        12,354,392  |
|    | |                                                  |                 |                 |                    |
|    | +---SortMeComp.compare(SortMe, SortMe)             |   3,936   13 %  |          3,936  |        12,354,392  |
|    |                                                    |                 |                 |                    |
|    +---SortMeCompMethodCalls.compare(Object, Object)    |  11,427   37 %  |          7,613  |        12,352,146  |
|      |                                                  |                 |                 |                    |
|      +---SortMeCompMethodCalls.compare(SortMe, SortMe)  |   3,814   12 %  |          3,814  |        12,352,146  |
+---------------------------------------------------------+-----------------+-----------------+--------------------+

So it looks like savinos is right! The extra time is only the extra method call (plus a little for the cast). That seems crazy to me; you'd think that'd be JITed away? Ah well.

I removed edit 2 and added it as an answer as it should've been originally.

like image 396
Bryan Head Avatar asked Oct 06 '11 17:10

Bryan Head


People also ask

What is the time complexity of Comparator in Java?

The time complexity including the comparator is O(n * (log n) * c) , where c is the time complexity of the comparator. By default, Arrays. sort uses a comparator following the natural ordering of the content for an O(1) time complexity.

How does sort work with Comparator?

Using a comparator, we can sort the elements based on data members. For instance, it may be on roll no, name, age, or anything else. Method of Collections class for sorting List elements is used to sort the elements of List by the given comparator.

How does Compare method of Comparator work in Java?

The compare Method obj1 and obj2 are the objects to be compared. This method returns zero if the objects are equal. It returns a positive value if obj1 is greater than obj2. Otherwise, a negative value is returned.


2 Answers

I may be wrong, but I would say that the delta between the "Object" comparator and the typed comparator (which is called by the generic one) is simply due to the extra function call.

Consider that you are doing 12,352,936 invocations, which means roughly 5.7*10^-7 seconds per function call, which is not unreasonable.

like image 137
Savino Sguera Avatar answered Oct 06 '22 01:10

Savino Sguera


A little off topic, but you must want it to be fast...

You'll reduce the inner compare() time by around 50%, with random data, if you change it to something like:

public int compare(SortMe one, SortMe two) {
    return one.getValue() - two.getValue();
}

However, this is only valid if the magnitude of the range of inputs is smaller than 2^31. If larger, the difference overflows.

like image 35
Ed Staub Avatar answered Oct 05 '22 23:10

Ed Staub