Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Strange performance drop after innocent changes to a trivial program

Imagine you want to count how many non-ASCII chars a given char[] contains. Imagine, the performance really matters, so we can skip our favorite slogan.

The simplest way is obviously

int simpleCount() {
    int result = 0;
    for (int i = 0; i < string.length; i++) {
        result += string[i] >= 128 ? 1 : 0;
    }
    return result;
}

Then you think that many inputs are pure ASCII and that it could be a good idea to deal with them separately. For simplicity assume you write just this

private int skip(int i) {
    for (; i < string.length; i++) {
        if (string[i] >= 128) break;
    }
    return i;
}

Such a trivial method could be useful for more complicated processing and here it can't do no harm, right? So let's continue with

int smartCount() {
    int result = 0;
    for (int i = skip(0); i < string.length; i++) {
        result += string[i] >= 128 ? 1 : 0;
    }
    return result;
}

It's the same as simpleCount. I'm calling it "smart" as the actual work to be done is more complicated, so skipping over ASCII quickly makes sense. If there's no or a very short ASCII prefix, it can costs a few cycles more, but that's all, right?

Maybe you want to rewrite it like this, it's the same, just possibly more reusable, right?

int smarterCount() {
    return finish(skip(0));
}

int finish(int i) {
    int result = 0;
    for (; i < string.length; i++) {
        result += string[i] >= 128 ? 1 : 0;
    }
    return result;
}

And then you ran a benchmark on some very long random string and get this img The parameters determine the ASCII to non-ASCII ratio and the average length of a non-ASCII sequence, but as you can see they don't matter. Trying different seeds and whatever doesn't matter. The benchmark uses caliper, so the usual gotchas don't apply. The results are fairly repeatable, the tiny black bars at the end denote the minimum and maximum times.

Does anybody have an idea what's going on here? Can anybody reproduce it?

like image 919
maaartinus Avatar asked Nov 03 '13 09:11

maaartinus


1 Answers

Got it.

The difference is in the possibility for the optimizer/CPU to predict the number of loops in for. If it is able to predict the number of repeats up front, it can skip the actual check of i < string.length. Therefore the optimizer needs to know up front how often the condition in the for-loop will succeed and therefore it must know the value of string.length and i.

I made a simple test, by replacing string.length with a local variable, that is set once in the setup method. Result: smarterCount has runtime of about simpleCount. Before the change smarterCount took about 50% longer then simpleCount. smartCount did not change.

It looks like the optimizer looses the information of how many loops it will have to do when a call to another method occurs. That's the reason why finish() immediately ran faster with the constant set, but not smartCount(), as smartCount() has no clue about what i will be after the skip() step. So I did a second test, where I copied the loop from skip() into smartCount().

And voilà, all three methods return within the same time (800-900 ms).

Different runtimes

like image 85
jboi Avatar answered Oct 05 '22 23:10

jboi