I'm building a PowerPC interpreter, and it works quite well. In the Power architecture the condition register CR0 (EFLAGS on x86) is updated on almost any instruction. It is set like this. The value of CR0 is 1, if the last result was negative, 2 if the last result was positive, 4 otherwise.
My first naive method to interpret this is:
if (n < 0) cr0 = 1 else if (n > 0) cr0 = 2; else cr0 = 4;
However I understand that all those branches won't be optimal, being run millions of times per second. I've seen some bit hacking on SO, but none seemed adeguate. For example I found many examples to convert a number to -1, 0, or 1 accordingly to the sign or 0. But how to make -1 = 1, 1 = 2, 0 = 4? I'm asking for the help of the Bit Hackers...
Thanks in advance
Update: First of all: thanks guys, you've been great. I'll test all of your codes carefully for speed and you'll be the first to know who's the winner.
@jalf: About your first advice, I wasn't actually calculating CR0 on every instruction. I was rather keeping a lastResult variable, and when (and if) a following instruction asked for a flag, do the comparison. Three main motivations took me back to "everytime" update:
First, if this variable is to be updated after (nearly) every instruction, the obvious piece of advice is this:
don't
Only update it when the subsequent instructions need its value. At any other time, there's no point in updating it.
But anyway, when we update it, what we want is this behavior:
R < 0 => CR0 == 0b001 R > 0 => CR0 == 0b010 R == 0 => CR0 == 0b100
Ideally, we won't need to branch at all. Here's one possible approach:
1
. (if you really want speed, investigate whether this can be done without fetching the constant from memory. Even if you have to spend a couple of instructions on it, it may well be worth it)Where steps 2 and 3 can be transformed to eliminate the "if" part
CR0 <<= (R >= 0); CR0 <<= (R == 0);
Is this faster? I don't know. As always, when you are concerned about performance, you need to measure, measure, measure.
However, I can see a couple of advantages of this approach:
The downside is that we have a dependency chain between all three lines: Each modifies CR0, which is then used in the next line. This limits instruction-level parallelism somewhat.
To minimize this dependency chain, we could do something like this instead:
CR0 <<= ((R >= 0) + (R == 0));
so we only have to modify CR0 once, after its initialization.
Or, doing everything in a single line:
CR0 = 1 << ((R >= 0) + (R == 0));
Of course, there are a lot of possible variations of this theme, so go ahead and experiment.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With