My python program executes faster than my java version of the same program. What gives?

Tags:

Update: 2009-05-29

Thanks for all the suggestions and advice. I used your suggestions to make my production code execute 2.5 times faster on average than my best result a couple of days ago. In the end I was able to make the java code the fastest.

Lessons:

My example code below shows the insertion of primitive ints but the production code is actually storing strings (my bad). When I corrected that the python execution time went from 2.8 seconds to 9.6. So right off the bat, the java was actually faster when storing objects.
But it doesn't stop there. I had been executing the java program as follows:

java -Xmx1024m SpeedTest

But if you set the initial heap size as follows you get a huge improvement:

java -Xms1024m -Xmx1024m SpeedTest

This simple change reduced the execution time by more than 50%. So the final result for my SpeedTest is python 9.6 seconds. Java 6.5 seconds.

Original Question:

I had the following python code:

import time
import sys

def main(args):    
    iterations = 10000000
    counts = set()
    startTime = time.time();    
    for i in range(0, iterations):
        counts.add(i)
    totalTime = time.time() - startTime
    print 'total time =',totalTime
    print len(counts)

if __name__ == "__main__":
    main(sys.argv)

And it executed in about 3.3 seconds on my machine but I wanted to make it faster so I decided to program it in java. I assumed that because java is compiled and is generally considered to be faster than python I would see some big paybacks.

Here is the java code:

import java.util.*;
class SpeedTest
{    
    public static void main(String[] args)
    {        
        long startTime;
        long totalTime;
        int iterations = 10000000;
        HashSet counts = new HashSet((2*iterations), 0.75f);

        startTime = System.currentTimeMillis();
        for(int i=0; i<iterations; i++)
        {
            counts.add(i);
        }
        totalTime = System.currentTimeMillis() - startTime;
        System.out.println("TOTAL TIME = "+( totalTime/1000f) );
        System.out.println(counts.size());
    }
}

So this java code does basically the same thing as the python code. But it executed in 8.3 seconds instead of 3.3.

I have extracted this simple example from a real-world example to simplify things. The critical element is that I have (set or hashSet) that ends up with a lot of members much like the example.

Here are my questions:

How come my python implementation is faster than my java implementation?
Is there a better data structure to use than the hashSet (java) to hold a unique collection?
What would make the python implementation faster?
What would make the java implementation faster?

UPDATE:

Thanks to all who have contributed so far. Please allow me to add some details.

I have not included my production code because it is quite complex. And would generate a lot of distraction. The case I present above is the most simplified possible. By that I mean that the java put call seems to be much slower than the python set`s add().

The java implementation of the production code is also about 2.5 - 3 times slower than the python version -- just like the above.

I am not concerned about vm warmup or startup overhead. I just want to compare the code from my startTime to my totalTime. Please do not concern yourselves with other matters.

I initialized the hashset with more than enough buckets so that it should never have to rehash. (I will always know ahead of time how many elements the collection will ultimately contain.) I suppose one could argue that I should have initialized it to iterations/0.75. But if you try it you will see that execution time is not significantly impacted.

I set Xmx1024m for those that were curious (my machine has 4GB of ram).

I am using java version: Java(TM) SE Runtime Environment (build 1.6.0_13-b03).

In the production version of I am storing a string (2-15 chars) in the hashSet so I cannot use primitives, although that is an interesting case.

I have run the code many, many times. I have very high confidence that the python code is between 2.5 and 3 times faster than the java code.

958

asked May 27 '09 22:05

Blaine Osepchuk

2 Answers

You're not really testing Java vs. Python, you're testing java.util.HashSet using autoboxed Integers vs. Python's native set and integer handling.

Apparently, the Python side in this particular microbenchmark is indeed faster.

I tried replacing HashSet with TIntHashSet from GNU trove and achieved a speedup factor between 3 and 4, bringing Java slightly ahead of Python.

The real question is whether your example code is really as representative of your application code as you think. Have you run a profiler and determined that most of the CPU time is spent in putting a huge number of ints into a HashSet? If not, the example is irrelevant. Even if the only difference is that your production code stores other objects than ints, their creation and the computation of their hashcode could easily dominate the set insertion (and totally destroy Python's advantage in handling ints specially), making this whole question pointless.

198

answered Sep 30 '22 06:09

Michael Borgwardt

I suspect that is that Python uses the integer value itself as its hash and the hashtable based implementation of set uses that value directly. From the comments in the source:

This isn't necessarily bad! To the contrary, in a table of size 2**i, taking the low-order i bits as the initial table index is extremely fast, and there are no collisions at all for dicts indexed by a contiguous range of ints. The same is approximately true when keys are "consecutive" strings. So this gives better-than-random behavior in common cases, and that's very desirable.

This microbenchmark is somewhat of a best case for Python because it results in exactly zero hash collisions. Whereas, if Javas HashSet is rehashing the keys it has to perform the additional work and also gets much worse behavior with collisions.

If you store the range(iterations) in a temporary variable and do a random.shuffle on it before the loop the runtime is more than 2x slower even if the shuffle and list creation is done outside the loop.

answered Sep 30 '22 08:09

Ants Aasma

Related questions
                            
                                How to get values of all elements from XML string in Java?
                            
                                How to print the results of an array in a single line?
                            
                                spring-data: Not an managed type: class java.lang.Object
                            
                                Fake X11 display?
                            
                                Best IDE for developing Java Desktop Applications [closed]
                            
                                Overhead of implementing an interface
                            
                                Why does my code produce the error: The statement did not return a result set [duplicate]
                            
                                Getting the actual screen height (android)
                            
                                @Before doesn't execute in java Cucumber Step
                            
                                Printing thread id in log file using log4j
                            
                                How can I save a file to the class path [closed]
                            
                                Easy way to call method in new thread
                            
                                Speeding Up Java
                            
                                Why is the JVM not platform independent, given that Java (the language) is platform independent?
                            
                                How to remove all elements in String array in java? [duplicate]
                            
                                Unable to resolve reverse routing methods in IntelliJ
                            
                                Why do languages like Java use hierarchical package names, while Python does not? [closed]
                            
                                Maven clean install: Failed to execute goal org.apache.maven.plugins:maven-resources-plugin:3.2.0:resources
                            
                                One-liner to count number of occurrences of String in a String[] in Java?
                            
                                Caused by: java.lang.ClassNotFoundException: Didn't find class in Firebase Phone Authentication

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

My python program executes faster than my java version of the same program. What gives?

Tags:

java

python

microbenchmark

Blaine Osepchuk

People also ask

2 Answers

Michael Borgwardt

Ants Aasma

Recent Activity

Donate For Us