Looking at Go standard library, there's a ConstantTimeByteEq function that looks like this:
func ConstantTimeByteEq(x, y uint8) int {
z := ^(x ^ y)
z &= z >> 4
z &= z >> 2
z &= z >> 1
return int(z)
}
Now, I understand the need for constant time string (array, etc.) comparison, as a regular algorithm could short-circuit after the first unequal element. But in this case, isn't a regular comparison of two fixed-sized integers a constant time operation at the CPU level already?
The idea behind this code is to compare all bytes of input using a flag value that will be flipped in any of the comparisons fail. Only when all the bytes were compared, is the ultimate result of the method returned.
The string comparison compares the two keys one byte at a time, stopping as soon as it finds an offset where the two strings do not match. As a result, checkApiKey will take longer if the two keys start with the same bytes.
The point is likely to avoid branch mispredictions, in addition to having the result as 1 or 0 instead of true or false (allowing follow ups as bitwise operations).
Compare how this compiles:
var a, b, c, d byte
_ = a == b && c == d
=>
0017 (foo.go:15) MOVQ $0,BX
0018 (foo.go:15) MOVQ $0,DX
0019 (foo.go:15) MOVQ $0,CX
0020 (foo.go:15) MOVQ $0,AX
0021 (foo.go:16) JMP ,24
0022 (foo.go:16) MOVQ $1,AX
0023 (foo.go:16) JMP ,30
0024 (foo.go:16) CMPB BX,DX
0025 (foo.go:16) JNE ,29
0026 (foo.go:16) CMPB CX,AX
0027 (foo.go:16) JNE ,29
0028 (foo.go:16) JMP ,22
0029 (foo.go:16) MOVQ $0,AX
With this:
var a, b, c, d byte
_ = subtle.ConstantTimeByteEq(a, b) & subtle.ConstantTimeByteEq(c, d)
=>
0018 (foo.go:15) MOVQ $0,DX
0019 (foo.go:15) MOVQ $0,AX
0020 (foo.go:15) MOVQ $0,DI
0021 (foo.go:15) MOVQ $0,SI
0022 (foo.go:16) XORQ AX,DX
0023 (foo.go:16) XORQ $-1,DX
0024 (foo.go:16) MOVQ DX,BX
0025 (foo.go:16) SHRB $4,BX
0026 (foo.go:16) ANDQ BX,DX
0027 (foo.go:16) MOVQ DX,BX
0028 (foo.go:16) SHRB $2,BX
0029 (foo.go:16) ANDQ BX,DX
0030 (foo.go:16) MOVQ DX,AX
0031 (foo.go:16) SHRB $1,DX
0032 (foo.go:16) ANDQ DX,AX
0033 (foo.go:16) MOVBQZX AX,DX
0034 (foo.go:16) MOVQ DI,BX
0035 (foo.go:16) XORQ SI,BX
0036 (foo.go:16) XORQ $-1,BX
0037 (foo.go:16) MOVQ BX,AX
0038 (foo.go:16) SHRB $4,BX
0039 (foo.go:16) ANDQ BX,AX
0040 (foo.go:16) MOVQ AX,BX
0041 (foo.go:16) SHRB $2,BX
0042 (foo.go:16) ANDQ BX,AX
0043 (foo.go:16) MOVQ AX,BX
0044 (foo.go:16) SHRB $1,BX
0045 (foo.go:16) ANDQ BX,AX
0046 (foo.go:16) MOVBQZX AX,BX
Although the latter version is longer, it's also linear -- there are no branches.
Not necessarily. And it is hard to tell what the compiler will emit after doing its optimizations. You could end up with different machine code for the highlevel "compare one byte". Leaking just a tiny bit in a side channel might change your encryption from "basically unbreakable" to "hopefully not worth the money needed for a crack".
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With