Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is a pointer indirection more costly than a conditional?

Is a pointer indirection (to fetch a value) more costly than a conditional?

I've observed that most decent compilers can precompute a pointer indirection to varying degrees--possibly removing most branching instructions--but what I'm interested in is whether the cost of an indirection is greater than the cost of a branch point in the generated code.

I would expect that if the data referenced by the pointer is not in a cache at runtime that a cache flush might occur, but I don't have any data to back that.

Does anyone have solid data (or a justifiable opinion) on the matter?


EDIT: Several posters noted that there is no "general case" on the cost of branching: it varies wildly from chip to chip.

If you happen to know of a notable case where branching would be cheaper (with or without branch prediction) than an in-cache indirection, please mention it.

like image 833
Philip Conrad Avatar asked Jan 10 '13 00:01

Philip Conrad


1 Answers

This is very much dependant on the circumstances.

1 How often is the data in cache (L1, L2, L3) or and how often it must be fetched all the way from the RAM?

A fetch from RAM will take around 10-40ns. Of course, that will fill a whole cache-line in little more than that, so if you then use the next few bytes as well, it will definitely not "hurt as bad".

2 What processor is it?

Older Intel Pentium4 were famous for their long pipeline stages, and would take 25-30 clockcycles (~15ns at 2GHz) to "recover" from a branch that was mispredicted.

3 How "predictable" is the condition?

Branch prediction really helps in modern processors, and they can cope quite well with "unpredictable" branches too, but it does hurt a little bit.

4 How "busy" and "dirty" is the cache?

If you have to throw out some dirty data to fill the cache-line, it will take another 15-50ns on top of the "fetch the data in" time.

The indirection itself will be a fast instruction, but of course, if the next instruction uses the data immediately after, you may not be able to execute that instruction immediately - even if the data is in L1 cache.

On a good day (well predicted, target in cache, wind in the right direction, etc), a branch, on the other hand, takes 3-7 cycles.

And finally, of course, the compiler USUALLY knows quite well what works best... ;)

In summary, it's hard to say for sure, and the only way to tell what is better IN YOUR case would be to benchmark alternative solutions. I would thin that an indirect memory access is faster than a jump, but without seeing what code your source compiles to, it's quite hard to say.

like image 140
Mats Petersson Avatar answered Sep 28 '22 09:09

Mats Petersson