Implementing a timing, cache attack resistant sort in C

Question

Disclaimer: I am well aware implementing your own crypto is a very bad idea. This is part of a master thesis, the code will not be used in practice.

As part of a larger cryptographic algorithm, I need to sort an array of constant length (small, 24 to be precise), without leaking any information on the contents of this array. As far as I know (please correct me if these are not sufficient to prevent timing and cache attacks), this means:

The sort should run in the same amount of cycles in terms of the length of the array, regardless of the particular values of the array
The sort should not branch or access memory depending on the particular values of the array

Do any such implementations exist? If not, are there any good resources on this type of programming?

To be honest, I'm even struggling with the easier subproblem, namely finding the smallest value of an array.

double arr[24]; // some input
double min = DBL_MAX;

int i;
for (i = 0; i < 24; ++i) {
    if (arr[i] < min) {
        min = arr[i];
    }
}

Would adding an else with a dummy assignment be sufficient to make it timing-safe? If so, how do I ensure the compiler (GCC in my case) doesn't undo my hard work? Would this be susceptible to cache attacks?

2501 · Accepted Answer

Use a sorting network, a series of comparisons and swaps.

The swap call must not be dependent on the comparison. It must be implemented in a way to execute the same amount of instructions, regardless of the comparison result.

Like this:

void swap( int* a , int* b , bool c )
{
    const int min = c ? b : a;
    const int max = c ? a : b;
    *a = min;
    *b = max;  
}

swap( &array[0] , &array[1] , array[0] > array[1] );

Then find the sorting network and use the swaps. Here is a generator that does that for you: http://pages.ripco.net/~jgamble/nw.html

Example for 4 elements, the numbers are array indices, generated by the above link:

SWAP(0, 1);
SWAP(2, 3);
SWAP(0, 2);
SWAP(1, 3);
SWAP(1, 2);

Art · Answer

This is a very dumb bubble sort that actually works and doesn't branch or change memory access behavior depending on input data. Not sure if this can be plugged into another sorting algorithm, they need their compares separate from the swaps, but maybe it's possible, working on that now.

#include <stdint.h>

static void
cmp_and_swap(uint32_t *ap, uint32_t *bp)
{
        uint32_t a = *ap;
        uint32_t b = *bp;
        int64_t c = (int64_t)a - (int64_t)b;
        uint32_t sign = ((uint64_t)c >> 63);
        uint32_t min = a * sign + b * (sign ^ 1);
        uint32_t max = b * sign + a * (sign ^ 1);
        *ap = min;
        *bp = max;
}

void
timing_sort(uint32_t *arr, int n)
{
        int i, j;
        for (i = n - 1; i >= 0; i--) {
                for (j = 0; j < i; j++) {
                        cmp_and_swap(&arr[j], &arr[j + 1]);
                }
        }
}

The cmp_and_swap function compiles to (Apple LLVM version 7.3.0 (clang-703.0.29), compiled with -O3):

_cmp_and_swap:
00000001000009e0        pushq   %rbp
00000001000009e1        movq    %rsp, %rbp
00000001000009e4        movl    (%rdi), %r8d
00000001000009e7        movl    (%rsi), %r9d
00000001000009ea        movq    %r8, %rdx
00000001000009ed        subq    %r9, %rdx
00000001000009f0        shrq    $0x3f, %rdx
00000001000009f4        movl    %edx, %r10d
00000001000009f7        negl    %r10d
00000001000009fa        orl     $-0x2, %edx
00000001000009fd        incl    %edx
00000001000009ff        movl    %r9d, %ecx
0000000100000a02        andl    %edx, %ecx
0000000100000a04        andl    %r8d, %edx
0000000100000a07        movl    %r8d, %eax
0000000100000a0a        andl    %r10d, %eax
0000000100000a0d        addl    %eax, %ecx
0000000100000a0f        andl    %r9d, %r10d
0000000100000a12        addl    %r10d, %edx
0000000100000a15        movl    %ecx, (%rdi)
0000000100000a17        movl    %edx, (%rsi)
0000000100000a19        popq    %rbp
0000000100000a1a        retq
0000000100000a1b        nopl    (%rax,%rax)

Only memory accesses are reading and writing of the array, no branches. The compiler did figure out what the multiplication actually does, quite clever actually, but it didn't use branches for that.

The casts to int64_t are necessary to avoid overflows. I'm pretty sure it can be written cleaner.

As requested, here's a compare function for doubles:

void
cmp_and_swap(double *ap, double *bp)
{
        double a = *ap;
        double b = *bp;
        int sign = signbit(a - b);
        double min = a * sign + b * (sign ^ 1);
        double max = b * sign + a * (sign ^ 1);
        *ap = min;
        *bp = max;
}

Compiled code is branchless and doesn't change memory access pattern depending on input data.

Implementing a timing, cache attack resistant sort in C

Tags:

c

bkjvbx

2 Answers

2501

Art

Recent Activity

Donate For Us

Implementing a timing, cache attack resistant sort in C

Tags:

c

bkjvbx

2 Answers

2501

Art

Related questions

Recent Activity

Donate For Us