I have a few different solutions to Project Euler problem 5, but the execution time difference between the two languages/platforms in this particular implementation intrigues me. I didn't do any optimization with compiler flags, just plain <code>javac</code> (via commandline) and <code>csc</code> (via Visual Studio). Here's the Java code. It finishes in 55ms. <pre class="prettyprint lang-java prettyprint-override"><code>public class Problem005b { public static void main(String[] args) { long begin = System.currentTimeMillis(); int i = 20; while (true) { if ( (i % 19 == 0) && (i % 18 == 0) && (i % 17 == 0) && (i % 16 == 0) && (i % 15 == 0) && (i % 14 == 0) && (i % 13 == 0) && (i % 12 == 0) && (i % 11 == 0) ) { break; } i += 20; } long end = System.currentTimeMillis(); System.out.println(i); System.out.println(end-begin + "ms"); } } </code></pre> Here is the identical C# code. It finishes in 320ms <pre class="prettyprint lang-cs prettyprint-override"><code>using System; namespace ProjectEuler05 { class Problem005 { static void Main(String[] args) { DateTime begin = DateTime.Now; int i = 20; while (true) { if ( (i % 19 == 0) && (i % 18 == 0) && (i % 17 == 0) && (i % 16 == 0) && (i % 15 == 0) && (i % 14 == 0) && (i % 13 == 0) && (i % 12 == 0) && (i % 11 == 0) ) { break; } i += 20; } DateTime end = DateTime.Now; TimeSpan elapsed = end - begin; Console.WriteLine(i); Console.WriteLine(elapsed.TotalMilliseconds + "ms"); } } } </code></pre>

<ol> <li>To time code execution, you should use the <code>StopWatch</code> class.</li> <li>Also, you have to account for the JIT, the runtime etc, so let the test run a sufficient amount of times (like 10,000, 100,000 times) and get some sort of average. It is important to run the code multiple times, not the program. So write a method, and loop in the main method to get your measurements.</li> <li>remove all debugging stuff from the assemblies and let the code run stand-alone in a release build</li> </ol>

There are a few optimizations possible. Maybe the Java JIT is performing them and the CLR is not. Optimization #1: <pre class="prettyprint"><code>(x % a == 0) && (x % b == 0) && ... && (x % z == 0) </code></pre> is equivalent to <pre class="prettyprint"><code>(x % lcm(a, b, ... z) == 0) </code></pre> So in your example the comparison chain could be replaced by <pre class="prettyprint"><code>if (i % 232792560 == 0) break; </code></pre> (but of course if you've already calculated the LCM, there's little point in running the program in the first place!) Optimization #2: This is also equivalent: <pre class="prettyprint"><code>if (i % (14549535 * 16)) == 0 break; </code></pre> or <pre class="prettyprint"><code>if ((i % 16 == 0) && (i % 14549535 == 0)) break; </code></pre> The first division can be replaced with a mask and compare against zero: <pre class="prettyprint"><code>if (((i & 15) == 0) && (i % 14549535 == 0)) break; </code></pre> The second division can be replaced by a multiplication by the modular inverse: <pre class="prettyprint"><code>final long LCM = 14549535; final long INV_LCM = 8384559098224769503L; // == 14549535**-1 mod 2**64 final long MAX_QUOTIENT = Long.MAX_VALUE / LCM; // ... if (((i & 15) == 0) && (0 <= (i>>4) * INV_LCM) && ((i>>4) * INV_LCM < MAX_QUOTIENT)) { break; } </code></pre> It is somewhat unlikely that the JIT is employing this, but it is not as far-fetched as you might think - some C compilers implement pointer subtraction this way.

Why is this Java code 6x faster than the identical C# code?

Tags:

java

c#

execution-time

I have a few different solutions to Project Euler problem 5, but the execution time difference between the two languages/platforms in this particular implementation intrigues me. I didn't do any optimization with compiler flags, just plain javac (via commandline) and csc (via Visual Studio).

Here's the Java code. It finishes in 55ms.

public class Problem005b
{
    public static void main(String[] args)
    {
        long begin = System.currentTimeMillis();
        int i = 20;
        while (true)
        {
            if (
                    (i % 19 == 0) &&
                    (i % 18 == 0) &&
                    (i % 17 == 0) &&
                    (i % 16 == 0) &&
                    (i % 15 == 0) &&
                    (i % 14 == 0) &&
                    (i % 13 == 0) &&
                    (i % 12 == 0) &&
                    (i % 11 == 0)
                )
            {
                break;
            }
            i += 20;
        }
        long end = System.currentTimeMillis();
        System.out.println(i);
        System.out.println(end-begin + "ms");
    }   
}

Here is the identical C# code. It finishes in 320ms

using System;

namespace ProjectEuler05
{
    class Problem005
    {
        static void Main(String[] args)
        {
            DateTime begin = DateTime.Now;
            int i = 20;
            while (true)
            {
                if (
                        (i % 19 == 0) &&
                        (i % 18 == 0) &&
                        (i % 17 == 0) &&
                        (i % 16 == 0) &&
                        (i % 15 == 0) &&
                        (i % 14 == 0) &&
                        (i % 13 == 0) &&
                        (i % 12 == 0) &&
                        (i % 11 == 0)
                    )
                    {
                        break;
                    }
                i += 20;
            }
            DateTime end = DateTime.Now;
            TimeSpan elapsed = end - begin;
            Console.WriteLine(i);
            Console.WriteLine(elapsed.TotalMilliseconds + "ms");
        }
    }
}

470

asked May 10 '11 15:05

rianjs

3 Answers

To time code execution, you should use the StopWatch class.
Also, you have to account for the JIT, the runtime etc, so let the test run a sufficient amount of times (like 10,000, 100,000 times) and get some sort of average. It is important to run the code multiple times, not the program. So write a method, and loop in the main method to get your measurements.
remove all debugging stuff from the assemblies and let the code run stand-alone in a release build

answered Sep 20 '22 00:09

Femaref

There are a few optimizations possible. Maybe the Java JIT is performing them and the CLR is not.

Optimization #1:

(x % a == 0) && (x % b == 0) && ... && (x % z == 0)

is equivalent to

(x % lcm(a, b, ... z) == 0)

So in your example the comparison chain could be replaced by

if (i % 232792560 == 0) break;

(but of course if you've already calculated the LCM, there's little point in running the program in the first place!)

Optimization #2:

This is also equivalent:

if (i % (14549535 * 16)) == 0 break;

if ((i % 16 == 0) && (i % 14549535 == 0)) break;

The first division can be replaced with a mask and compare against zero:

if (((i & 15) == 0) && (i % 14549535 == 0)) break;

The second division can be replaced by a multiplication by the modular inverse:

final long LCM = 14549535;
final long INV_LCM = 8384559098224769503L; // == 14549535**-1 mod 2**64
final long MAX_QUOTIENT = Long.MAX_VALUE / LCM;
// ...
if (((i & 15) == 0) &&
    (0 <= (i>>4) * INV_LCM) &&
    ((i>>4) * INV_LCM < MAX_QUOTIENT)) {
    break;
}

It is somewhat unlikely that the JIT is employing this, but it is not as far-fetched as you might think - some C compilers implement pointer subtraction this way.

answered Sep 20 '22 00:09

finnw

The key to making these two become closer is to ensure that the comparison is fair.

First of all ensuring that costs associated with running Debug builds, loading pdb symbols as you did.

Next you need to ensure that there are no init costs being counted. Obviously these are real costs, and may matter to some people, but in this instance we are interested in the loop itself.

Next you need to deal with the platform specific behaviour. If you are on a 64bit windows machine you may be running either in 32bit or 64bit mode. In 64bit mode the JIT is different in many respects, often altering the resulting code considerably. Specifically, and I would guess pertinently, you get access to twice as many general purpose registers.

In this case the inner section of the loop, when naively translated into machine code, would need to load into registers the constants used in the modulo tests. If there are insufficient to hold everything needed in the loop then it must push them in from memory. Even coming from level1 cache this would be a significant hit compared to keeping it all in registers.

In VS 2010 MS changed the default target from anycpu to x86. I have nothing like the resources or customer facing knowledge of MSFT so I won't try to second guess that. However anyone looking at anything like the performance analysis you are doing should certainly try both.

Once those disparities are ironed out the numbers seem far more reasonable. Any further differences likely require better than educated guesses, instead they would need investigation into the actual differences in the generated machine code.

There are several things about this I think would be interesting for an optimising compiler.

The ones finnw already mentioned:
- The lcm option interesting but I can't see a compiler writer bothering.
- the reduction of division to multiplication and masking.
  - I don't know enough about this, but other people have tried note that they call out the divider on the more recent intel chips significantly better.
  - Perhaps you could even arrange something complex, with SSE2.
  - Certainly the modulo 16 operation is ripe for conversion into a mask or shift.
- A compiler could spot that none of the tests have side effects.
  - it could speculatively try to evaluate several of them at once, on a super scalar processor this could pump things along quite a bit faster, but would depend heavily on how well the compilers layout interacted with the OO execution engine.
- If register pressure was tight you could implement the constants as a single variable, set at the start of each loop then increment as you go along.

These are all utter guesses, and should be viewed as the idle meanderings. If you want to know disassemble it.

answered Sep 21 '22 00:09

ShuggyCoUk

Related questions
                            
                                Error compiling a verbose Java regex with character class and word boundary
                            
                                How to save parsed and changed DOM document in xml file?
                            
                                Trim a possible prefix of a string in Java
                            
                                How to run a Spark Java program
                            
                                What does an assignment expression evaluate to in Java?
                            
                                What is Java's answer to WPF? [closed]
                            
                                Shared Memory between two JVMs
                            
                                Eclipse formatter settings for the Builder pattern
                            
                                Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
                            
                                Apache HttpClient 4.0.3 - how do I set cookie with sessionID for POST request?
                            
                                Adding resources in IntelliJ for Maven project
                            
                                What use is @TestInstance annotation in JUnit 5?
                            
                                Getting Integer object from ResultSet [duplicate]
                            
                                Do we have a Readonly field in java (which is set-able within the scope of the class itself)?
                            
                                How do I add package level annotations or edit package-info.java?
                            
                                Is this a memory leak or a false positive?
                            
                                Spring security does not allow CSS or JS resources to be loaded
                            
                                Retrieve Java Annotation Attribute
                            
                                What is the difference between Socket and ServerSocket?
                            
                                PersistenceContext EntityManager injection NullPointerException

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With