I've observed that most decent compilers can precompute a pointer indirection to varying degrees--possibly removing most branching instructions--but what I'm interested in is whether the cost of an indirection is greater than the cost of a branch point in the generated code.
I would expect that if the data referenced by the pointer is not in a cache at runtime that a cache flush might occur, but I don't have any data to back that.
Does anyone have solid data (or a justifiable opinion) on the matter?
EDIT: Several posters noted that there is no "general case" on the cost of branching: it varies wildly from chip to chip.
If you happen to know of a notable case where branching would be cheaper (with or without branch prediction) than an in-cache indirection, please mention it.
This is very much dependant on the circumstances.
1 How often is the data in cache (L1, L2, L3) or and how often it must be fetched all the way from the RAM?
A fetch from RAM will take around 10-40ns. Of course, that will fill a whole cache-line in little more than that, so if you then use the next few bytes as well, it will definitely not "hurt as bad".
2 What processor is it?
Older Intel Pentium4 were famous for their long pipeline stages, and would take 25-30 clockcycles (~15ns at 2GHz) to "recover" from a branch that was mispredicted.
3 How "predictable" is the condition?
Branch prediction really helps in modern processors, and they can cope quite well with "unpredictable" branches too, but it does hurt a little bit.
4 How "busy" and "dirty" is the cache?
If you have to throw out some dirty data to fill the cache-line, it will take another 15-50ns on top of the "fetch the data in" time.
The indirection itself will be a fast instruction, but of course, if the next instruction uses the data immediately after, you may not be able to execute that instruction immediately - even if the data is in L1 cache.
On a good day (well predicted, target in cache, wind in the right direction, etc), a branch, on the other hand, takes 3-7 cycles.
And finally, of course, the compiler USUALLY knows quite well what works best... ;)
In summary, it's hard to say for sure, and the only way to tell what is better IN YOUR case would be to benchmark alternative solutions. I would thin that an indirect memory access is faster than a jump, but without seeing what code your source compiles to, it's quite hard to say.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With