How does jvm optimize loop code?

Question

There is a method that can search substring from a text(use brute force algorithm, please ignore null pointer)

public static int forceSearch(String text, String pattern) {
    int patternLength = pattern.length();
    int textLength = text.length();

    for (int i = 0, n = textLength - patternLength; i <= n; i++) {
        int j = 0;
        for (; j < patternLength && text.charAt(i + j) == pattern.charAt(j); j++) {
            ;
        }
        if (j == patternLength) {
            return i;
        }
    }
    return -1;
}

Strangely! Use the same algorithm, but the following code is more faster!!!

public static int forceSearch(String text, String pattern) {
    int patternLength = pattern.length();
    int textLength = text.length();

    char first = pattern.charAt(0);
    for (int i = 0, n = textLength - patternLength; i <= n; i++) {
        if (text.charAt(i) != first) {
            while (++i <= n && text.charAt(i) != first)
                ;
        }

        int j = 0;
        for (; j < patternLength && text.charAt(i + j) == pattern.charAt(j); j++) {
            ;
        }
        if (j == patternLength) {
            return i;
        }
    }
    return -1;
}

I found the second code is obviously faster than first if I run it by jvm. Howere, when I write it in c and run, the two functions take almost the same time. So I think the reason is that jvm optimize loop code

if (text.charAt(i) != first) {
    while (++i <= max && text.charAt(i) != first)
        ;
}

Am I right? If I'm right, how should we use the jvm optimization strategy to optimize our code?

Hope somebody help, thankyou:)

Mike Strobel · Accepted Answer

If you really want to get to the bottom of this, you'll probably need to instruct the JVM to print the assembly. In my experience, minor tweaks to loops can cause surprising performance differences. But it's not necessarily due to optimizations of the loop itself.

There are plenty of factors that can affect how your code gets JIT compiled. For example, tweaking the size of a method can affect your inlining tree, which could mean better or worse performance depending on what your call stack looks like. If a method gets inlined further up the call stack, it could prevent nested call sites from being inlined into the same frame. If those nested call sites are especially 'hot', the added call overhead could be substantial. I'm not saying that's the cause here; I'm merely pointing out that there are many thresholds that govern how the JIT arranges your code, and the reasons for performance differences are not always obvious.

One nice thing about using JMH for benchmarks is that you can reduce the influence of such changes by explicitly setting inlining boundaries. But you can use -XX:CompileCommand to achieve the same effects manually.

There are, of course, other factors like cache friendliness that require more intuitive analysis. Given that your benchmark probably doesn't have a particularly deep call tree, I'm inclined to lean towards cache behavior as a more likely explanation. I would guess that your second version performs better because your comparand (the first chunk of pattern) is usually in your L1 cache, while your first version causes more cache churn. If your inputs are long (and it sounds like they are), then this is a likely explanation. If not, the reasons could be more subtle, e.g., your first version could be 'tricking' the CPU into employing more aggressive cache prefetching, but in a way that actually hurts performance (at least for the inputs you are benchmarking). Regardless, if cache behavior is to explain, then I wonder why you do not see a similar difference in the C versions. What optimization flags are you compiling the C version with?

Dead code elimination might also be a factor. I would have to see what your inputs are, but it's possible that your hand-optimized version causes certain instruction blocks to never be hit during the instrumented interpretation phase, leading the JIT to exclude them from the final assembly.

I reiterate: if you want to get to the bottom of this, you'll want to force the JIT to dump the assembly for each version (and compare to the C versions as well).

Mạnh Quyết Nguyễn · Answer

This if statement simplify a lot of work (especially when the pattern is found at the end of the input string.

   if (text.charAt(i) != first) {
        while (++i <= n && text.charAt(i) != first)
            ;
    }

In the first version, you have to check j < patternLength for every i before comparing the first character.

In the second version you don't need to.

But strangely I think for small input it does not make much different.

Could you share the length of items you used to benchmark?

How does jvm optimize loop code?

Tags:

java

string

algorithm

optimization

jvm

Sam

2 Answers

Mike Strobel

Mạnh Quyết Nguyễn

Recent Activity

Donate For Us

How does jvm optimize loop code?

Tags:

java

string

algorithm

optimization

jvm

Sam

2 Answers

Mike Strobel

Mạnh Quyết Nguyễn

Related questions

Recent Activity

Donate For Us