Disclaimer: I am well aware implementing your own crypto is a very bad idea. This is part of a master thesis, the code will not be used in practice.
As part of a larger cryptographic algorithm, I need to sort an array of constant length (small, 24 to be precise), without leaking any information on the contents of this array. As far as I know (please correct me if these are not sufficient to prevent timing and cache attacks), this means:
Do any such implementations exist? If not, are there any good resources on this type of programming?
To be honest, I'm even struggling with the easier subproblem, namely finding the smallest value of an array.
double arr[24]; // some input
double min = DBL_MAX;
int i;
for (i = 0; i < 24; ++i) {
if (arr[i] < min) {
min = arr[i];
}
}
Would adding an else
with a dummy assignment be sufficient to make it timing-safe? If so, how do I ensure the compiler (GCC in my case) doesn't undo my hard work? Would this be susceptible to cache attacks?
Use a sorting network, a series of comparisons and swaps.
The swap call must not be dependent on the comparison. It must be implemented in a way to execute the same amount of instructions, regardless of the comparison result.
Like this:
void swap( int* a , int* b , bool c )
{
const int min = c ? b : a;
const int max = c ? a : b;
*a = min;
*b = max;
}
swap( &array[0] , &array[1] , array[0] > array[1] );
Then find the sorting network and use the swaps. Here is a generator that does that for you: http://pages.ripco.net/~jgamble/nw.html
Example for 4 elements, the numbers are array indices, generated by the above link:
SWAP(0, 1);
SWAP(2, 3);
SWAP(0, 2);
SWAP(1, 3);
SWAP(1, 2);
This is a very dumb bubble sort that actually works and doesn't branch or change memory access behavior depending on input data. Not sure if this can be plugged into another sorting algorithm, they need their compares separate from the swaps, but maybe it's possible, working on that now.
#include <stdint.h>
static void
cmp_and_swap(uint32_t *ap, uint32_t *bp)
{
uint32_t a = *ap;
uint32_t b = *bp;
int64_t c = (int64_t)a - (int64_t)b;
uint32_t sign = ((uint64_t)c >> 63);
uint32_t min = a * sign + b * (sign ^ 1);
uint32_t max = b * sign + a * (sign ^ 1);
*ap = min;
*bp = max;
}
void
timing_sort(uint32_t *arr, int n)
{
int i, j;
for (i = n - 1; i >= 0; i--) {
for (j = 0; j < i; j++) {
cmp_and_swap(&arr[j], &arr[j + 1]);
}
}
}
The cmp_and_swap
function compiles to (Apple LLVM version 7.3.0 (clang-703.0.29), compiled with -O3):
_cmp_and_swap:
00000001000009e0 pushq %rbp
00000001000009e1 movq %rsp, %rbp
00000001000009e4 movl (%rdi), %r8d
00000001000009e7 movl (%rsi), %r9d
00000001000009ea movq %r8, %rdx
00000001000009ed subq %r9, %rdx
00000001000009f0 shrq $0x3f, %rdx
00000001000009f4 movl %edx, %r10d
00000001000009f7 negl %r10d
00000001000009fa orl $-0x2, %edx
00000001000009fd incl %edx
00000001000009ff movl %r9d, %ecx
0000000100000a02 andl %edx, %ecx
0000000100000a04 andl %r8d, %edx
0000000100000a07 movl %r8d, %eax
0000000100000a0a andl %r10d, %eax
0000000100000a0d addl %eax, %ecx
0000000100000a0f andl %r9d, %r10d
0000000100000a12 addl %r10d, %edx
0000000100000a15 movl %ecx, (%rdi)
0000000100000a17 movl %edx, (%rsi)
0000000100000a19 popq %rbp
0000000100000a1a retq
0000000100000a1b nopl (%rax,%rax)
Only memory accesses are reading and writing of the array, no branches. The compiler did figure out what the multiplication actually does, quite clever actually, but it didn't use branches for that.
The casts to int64_t are necessary to avoid overflows. I'm pretty sure it can be written cleaner.
As requested, here's a compare function for doubles:
void
cmp_and_swap(double *ap, double *bp)
{
double a = *ap;
double b = *bp;
int sign = signbit(a - b);
double min = a * sign + b * (sign ^ 1);
double max = b * sign + a * (sign ^ 1);
*ap = min;
*bp = max;
}
Compiled code is branchless and doesn't change memory access pattern depending on input data.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With