The 2 following versions of the same function (which basically tries to recover a password by brute force) do not give same performance:
private static final char[] CHARS = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789".toCharArray();
private static final int N_CHARS = CHARS.length;
private static final int MAX_LENGTH = 8;
private static char[] recoverPassword()
{
char word[];
int refi, i, indexes[];
for (int length = 1; length <= MAX_LENGTH; length++)
{
refi = length - 1;
word = new char[length];
indexes = new int[length];
indexes[length - 1] = 1;
while(true)
{
i = length - 1;
while ((++indexes[i]) == N_CHARS)
{
word[i] = CHARS[indexes[i] = 0];
if (--i < 0)
break;
}
if (i < 0)
break;
word[i] = CHARS[indexes[i]];
if (isValid(word))
return word;
}
}
return null;
}
private static final char[] CHARS = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789".toCharArray();
private static final int N_CHARS = CHARS.length;
private static final int MAX_LENGTH = 8;
private static char[] recoverPassword()
{
char word[];
int refi, i, indexes[];
for (int length = 1; length <= MAX_LENGTH; length++)
{
refi = length - 1;
word = new char[length];
indexes = new int[length];
indexes[length - 1] = 1;
while(true)
{
i = refi;
while ((++indexes[i]) == N_CHARS)
{
word[i] = CHARS[indexes[i] = 0];
if (--i < 0)
break;
}
if (i < 0)
break;
word[i] = CHARS[indexes[i]];
if (isValid(word))
return word;
}
}
return null;
}
I would expect version 2 to be faster, as it does (and that is the only difference):
i = refi;
...as compare to version 1:
i = length -1;
However, it's the opposite: version 1 is faster by over 3%! Does someone knows why? Is that due to some optimization done by the compiler?
Thank you all for your answers that far. Just to add that the goal is actually not to optimize this piece of code (which is already quite optimized), but more to understand, from a compiler / CPU / architecture perspective, what could explain such performance difference. Your answers have been very helpful, thanks again!
Key
It is difficult to check this in a micro-benchmark because you cannot say for sure how the code has been optimised without reading the machine code generated, even then the CPU can do plenty of tricks to optimise it future eg. it turns the x86 code in RISC style instructions to actually execute.
A computation takes as little as one cycle and the CPU can perform up to three of them at once. An access to L1 cache takes 4 cycles and for L2, L3, main memory it takes 11, 40-75, 200 cycles.
Storing values to avoid a simple calculation is actually slower in many cases. BTW using division and modulus is quite expensive and caching this value can be worth it when micro-tuning your code.
The correct answer should be retrievable by a deassembler (i mean .class -> .java converter),
but my guess is that the compiler might have decided to get rid of iref altogether and decided to store length - 1 an auxiliary register.
I'm more of a c++ guy, but I would start by trying:
const int refi = length - 1;
inside the for loop. Also you should probably use
indexes[ refi ] = 1;
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With