For a project at university, we had to implement a few different algorithms to calculate the equivalenceclasses when given a set of elements and a collection of relations between said elements. We were instructed to implement, among others, the Union-Find algorithm and its optimizations (Union by Depth, Size). By accident (doing something I thought was necessary for the correctness of the algorithm) I discovered another way to optimize the algorithm. It isn't as fast as Union By Depth, but close. I couldn't for the life of me figure out why it was as fast as it was, so I consulted one of the teaching assistants who couldn't figure it out either. The project was in java and the datastructures I used were based on simple arrays of Integers (the object, not the <code>int</code>) Later, at the project's evaluation, I was told that it probably had something to do with 'Java caching', but I can't find anything online about how caching would effect this. What would be the best way, without calculating the complexity of the algorithm, to prove or disprove that my optimization is this fast because of java's way of doing stuff? Implementing it in another, (lower level?) language? But who's to say that language won't do the same thing? I hope I made myself clear, thanks

The only way is to prove the worst-case (average case, etc) complexity of the algorithm. Because if you don't, it might just be a consequence of a combination of <ul> <li>The particular data </li> <li>The size of the data</li> <li>Some aspect of the hardware</li> <li>Some aspect of the language implementation</li> <li>etc.</li> </ul>

It is likely that either the compiler or the JVM found an optimisation for your code. You could try reading the bytecode that is output by the <code>javac</code> compiler, and disabling runtime JIT compilation with the <code>-Djava.compiler=NONE</code> option.

(Dis)Proving that one algorithm works faster than another due to language internals

Tags:

java

algorithm

complexity-theory

time-complexity

optimization

For a project at university, we had to implement a few different algorithms to calculate the equivalenceclasses when given a set of elements and a collection of relations between said elements.

We were instructed to implement, among others, the Union-Find algorithm and its optimizations (Union by Depth, Size). By accident (doing something I thought was necessary for the correctness of the algorithm) I discovered another way to optimize the algorithm.

It isn't as fast as Union By Depth, but close. I couldn't for the life of me figure out why it was as fast as it was, so I consulted one of the teaching assistants who couldn't figure it out either.

The project was in java and the datastructures I used were based on simple arrays of Integers (the object, not the int) Later, at the project's evaluation, I was told that it probably had something to do with 'Java caching', but I can't find anything online about how caching would effect this.

What would be the best way, without calculating the complexity of the algorithm, to prove or disprove that my optimization is this fast because of java's way of doing stuff? Implementing it in another, (lower level?) language? But who's to say that language won't do the same thing?

I hope I made myself clear,

thanks

435

asked Dec 20 '10 00:12

thepandaatemyface

3 Answers

The only way is to prove the worst-case (average case, etc) complexity of the algorithm.

Because if you don't, it might just be a consequence of a combination of

The particular data
The size of the data
Some aspect of the hardware
Some aspect of the language implementation
etc.

171

answered Nov 18 '22 08:11

Mitch Wheat

It is generally very difficult to perform such task given modern VM's! Like you hint they perform all sorts of stuff behind your back. Method calls gets inlined, objects are reused. Etc. A prime example is seeing how trivial loops gets compiled away if their obviously are not performing anything other than counting. Or how a funtioncall in functional programming are inlined or tail-call optimized.

Further, you have the difficulty of proving your point in general on any data set. An O(n^2) can easily be much faster than a seemingly faster, say O(n), algorithm. Two examples

Bubble sort is faster at sorting a near-sorted data collection than quick sort.
Quick sort in the general case, of course is faster.

Generally the big-O notation purposely ignores constants which in a practical situation can mean life or death to your implementation. and those constants may have been what hit you. So in practice 0.00001 * n ^2 (say the running time of your algorithm) is faster than 1000000 * n log n

So reasoning is hard given the limited information you provide.

answered Nov 18 '22 07:11

Carlo V. Dango

It is likely that either the compiler or the JVM found an optimisation for your code. You could try reading the bytecode that is output by the javac compiler, and disabling runtime JIT compilation with the -Djava.compiler=NONE option.

answered Nov 18 '22 09:11

Neil Bartlett

Related questions
                            
                                Run Netbeans maven project from command-line?
                            
                                SMS raw pdu in Android
                            
                                C/C++ header to java
                            
                                linux tool to list all functions in a source file?
                            
                                Can I use an Antlr created lexer/parser to parse PDDL file and return data to a Java program?
                            
                                How to run java function for only 30 minutes
                            
                                How to create a custom 'new class wizard' for Eclipse?
                            
                                Overriding default accessor method across different classloaders breaks polymorphism
                            
                                speeding up refresh for GWT projects
                            
                                ConcurrentHashMap: what's the point in locking updates only?
                            
                                Ehcache - using a List<Integer> as the cache value
                            
                                If String concatenation using + is implemented using StringBuilder then why are extra objects created during concatenation?
                            
                                Why ZipInputStream can't read the output of ZipOutputStream?
                            
                                Android: Native process provoke ReferenceTable overflow after a while (android-ndk)
                            
                                How can GWT 2.1 editor framework support primitive types?
                            
                                How to copy input/output streams of the Process to their System counterparts?
                            
                                How do I find the overlapping area between two arbitrary polygons
                            
                                Why RMI localhost client uses a no localhost ip to connect the RMI localhost server
                            
                                Why can't my regular application context load my properties file?
                            
                                What is the "best practice" to remove support for a feature?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With