Earlier this year Intel published a list of instructions that are guaranteed not to have timing dependency on its data operands. (Initially it was suggested that these are constant-time only when DOITM is enabled, but later it was clarified that these are always constant-time, regardless of DOITM.) Out of curiosity I am looking at how closely real-world crypto implementations conform to this list (i.e. only using instructions from this list).
It turns out this list has a number of oddities. It has MOVDQU
, but not MOVUPS
, even though the two should be functionally identical. This is not a serious issue: I can simply take the assembly output of the compiler, and do sed 's/movups/movdqu/g'
before assembling.
A more difficult obstacle is that it does not have (V)SHUFPS
, even though it clearly has lots of other floating point shuffling instructions like VPERMILPS/D
. SHUFPS
is used in BLAKE3.
Is there a known reason this instruction is not included on the constant-time list? What would be a good way to simulate its functionality, using only instructions from this list?
I cannot find an answer to the first question (why it is not in the list), but I have a solution to the second question, namely how to workaround this instruction. For the BLAKE3 implementation, the problematic line is
#define _mm_shuffle_ps2(a, b, c) \
(_mm_castps_si128( \
_mm_shuffle_ps(_mm_castsi128_ps(a), _mm_castsi128_ps(b), (c))))
A drop in replacement is
#define _mm_shuffle_ps2(a, b, c) \
_mm_blend_epi32 (_mm_shuffle_epi32((a), (c)), _mm_shuffle_epi32((b), (c)), 0b1100)
This causes GCC to generate VPSHUFD
and VPBLENDD
, both of which should be constant-time according to Intel.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With