n is negative, positive or zero? return 1, 2, or 4

Question

I'm building a PowerPC interpreter, and it works quite well. In the Power architecture the condition register CR0 (EFLAGS on x86) is updated on almost any instruction. It is set like this. The value of CR0 is 1, if the last result was negative, 2 if the last result was positive, 4 otherwise.

My first naive method to interpret this is:

if (n < 0)     cr0 = 1 else if (n > 0)     cr0 = 2; else     cr0 = 4;

However I understand that all those branches won't be optimal, being run millions of times per second. I've seen some bit hacking on SO, but none seemed adeguate. For example I found many examples to convert a number to -1, 0, or 1 accordingly to the sign or 0. But how to make -1 = 1, 1 = 2, 0 = 4? I'm asking for the help of the Bit Hackers...

Thanks in advance

Update: First of all: thanks guys, you've been great. I'll test all of your codes carefully for speed and you'll be the first to know who's the winner.

@jalf: About your first advice, I wasn't actually calculating CR0 on every instruction. I was rather keeping a lastResult variable, and when (and if) a following instruction asked for a flag, do the comparison. Three main motivations took me back to "everytime" update:

On PPC you're not forced to update CR0 like on x86 (where ADD always change EFLAGS, even if not needed), you have two flavours of ADD, one updating. If the compiler chooses to use the updating one, it means that it's going to use the CR0 at some point, so there no point at delaying...
There's a particularly painful instruction called mtcrf, that enables you to change the CR0 arbitrarly. You can even set it to 7, with no arithmetic meaning... This just destroys the possibility of keeping a "lastResult" variable.

jalf · Accepted Answer

First, if this variable is to be updated after (nearly) every instruction, the obvious piece of advice is this:

don't

Only update it when the subsequent instructions need its value. At any other time, there's no point in updating it.

But anyway, when we update it, what we want is this behavior:

R < 0  => CR0 == 0b001  R > 0  => CR0 == 0b010 R == 0 => CR0 == 0b100

Ideally, we won't need to branch at all. Here's one possible approach:

Set CR0 to the value 1. (if you really want speed, investigate whether this can be done without fetching the constant from memory. Even if you have to spend a couple of instructions on it, it may well be worth it)
If R >= 0, left shift by one bit.
If R == 0, left shift by one bit

Where steps 2 and 3 can be transformed to eliminate the "if" part

CR0 <<= (R >= 0); CR0 <<= (R == 0);

Is this faster? I don't know. As always, when you are concerned about performance, you need to measure, measure, measure.

However, I can see a couple of advantages of this approach:

we avoid branches completely
we avoid memory loads/stores.
the instructions we rely on (bit shifting and comparison) should have low latency, which isn't always the case for multiplication, for example.

The downside is that we have a dependency chain between all three lines: Each modifies CR0, which is then used in the next line. This limits instruction-level parallelism somewhat.

To minimize this dependency chain, we could do something like this instead:

CR0 <<= ((R >= 0) + (R == 0));

so we only have to modify CR0 once, after its initialization.

Or, doing everything in a single line:

CR0 = 1 << ((R >= 0) + (R == 0));

Of course, there are a lot of possible variations of this theme, so go ahead and experiment.

n is negative, positive or zero? return 1, 2, or 4

Tags:

c++

bit-manipulation

bit-shift

bit

Francesco Boffa

1 Answers

jalf

Recent Activity

Donate For Us

n is negative, positive or zero? return 1, 2, or 4

Tags:

c++

bit-manipulation

bit-shift

bit

Francesco Boffa

1 Answers

jalf

Related questions

Recent Activity

Donate For Us