Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I dynamically hint a branch target to an x64 CPU?

I'd like to know how to write efficient jump tables for x64 processors, either in C, C++ or assembly. The input is known in advance, but impossible to predict algorithmically. Assuming I can look as far ahead as I want in the input stream, is there any way I can dynamically tell the CPU which address the next branch is going to go?

Essentially, I'd like to programmatically update the Branch Target Buffer. But I'd settle for anything that allows me to avoid flushing the pipeline in cases where I the programmer know in advance where the next branch is going by looking at the data but the processor cannot determine this from past patterns.

Realizing this is a very specific question and that I'm likely to have failed to convey it properly, here are a few alternate phrasings:

Is there an x64 equivalent to hbr Hint for Branch on the Cell processor?

Does it ever help to move an assembly cmp earlier than its conditional branch as it did with Itanium?

Is the predicted target of an indirect jump ever based on a register value instead of the last address used?

Thanks!

like image 540
Nathan Kurz Avatar asked Apr 26 '13 08:04

Nathan Kurz


Video Answer


1 Answers

If you are unable to find an exact answer, then you might be able to use the return address predictor instead of the branch target buffer. The general technique is called context threading, and a description can be found in the paper Context Threading: A Flexible and Efficient Dispatch Technique for Virtual Machine Interpreters.

The idea for you would be: if you can look far enough into the future, for each input that determines a control-flow change, you JIT-compile / emit a single direct call instruction into some executable memory. For example, if you had ten units of input, you would emit 10 calls.

When executed, this code would behave nicely as the return addresses of each called function would be unchanging and all of the calls would be direct.

One side note, I am not a CPU architecture person, so I am possibly simplifying things, but in principle I think this should work.

like image 112
Peter Goodman Avatar answered Oct 08 '22 06:10

Peter Goodman