I've been wondering why the 16x256 Bit Registers provided by AVX2 aren't getting used for storing normal registers when AVX cant help - to minimize the hitting of cache's for in situations where u just don't happen to have enough registers at hand. IsnÄ't it like that you can set and access AVX Registers in 1-2 Cycles?
All this wouldn't work of course if you're screwing up other code running the AVX stuff and kick it out of the registers. I haven't seen this obvious approach getting used yet, which lead me to asking this question.
At one time, Intel indeed recommended spilling from general purpose to SSE registers in their optimization manual. (That's not AVX exactly, but it is the same idea.) I haven't looked at the very latest manuals, so that advice may or may not be out of date.
Spilling to xmm registers has the disadvantage that those registers are not preserved across function calls. Given that the x86-64 is a register-memory machine, accessing spilled values on the stack also requires fewer instructions and fewer registers (compare add rax, [rsp+k]
to movq rbx, xmm0/add rax, rbx
). That might go some way to explaining why there isn't much interest in the technique.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With