Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a way to code a qsort comparison function for unsigned integers without using lots of branching statements?

Tags:

c

qsort

I wrote a (qsort-compatible) comparison function for a struct that has some unsigned fields in it:

typedef struct {
   int      a;
   unsigned b;
} T;

int cmp(T t1, T t2)
{
   // Decreasing order in "a"
   if (t1.a < t2.a) return +1;
   if (t1.a > t2.a) return -1;
   // Increasing order in "b"
   if (t1.b < t2.b) return -1;
   if (t1.b > t2.b) return +1;
   return 0;
}

Is there a way to write this function without needing two comparisons per field? I can't use the t1.b - t2.b trick because subtraction for unsigned integer wraps around.

I'm willing to accept answers using GCC extensions.

like image 892
hugomg Avatar asked Jun 09 '15 15:06

hugomg


2 Answers

Use wider math.

Given int and unsigned fields, a given platform very likely supports a wider integer type such as long long that accommodates putting these 2 together.

int cmp(T t1, T t2)
{
   // An optimized compilation will not do any multiplication nor addition,
   // Just simply load `n1` high/low halves with `t1.a`, `t1.b`.
   long long n1 = t1.a * (UINT_MAX + 1LL) + t1.b;
   long long n2 = t2.a * (UINT_MAX + 1LL) + t2.b;
   return (n1 > n2) - (n1 < n2);  
}

If this approach is faster - profiling will answer that for select platforms.

Although this uses fewer compares, the compares use wider math - possible a zero sum gain.

When a 2x integer width is available as in How to determine integer types that are twice the width as `int` and `unsigned`?. This works. For high portability, stick with OP's original approach.

The (var1 > var2) - (var1 < var2) is considered by some to be branch-less. Of course OP's original code could end with:

return (t1.b > t2.b) - (t1.b < t2.b);
like image 179
chux - Reinstate Monica Avatar answered Oct 15 '22 06:10

chux - Reinstate Monica


Any relational comparison between two values can only yield one of two results. You need three distinct results for a qsort comparison function, so a single comparison can't do the job. (Perl has a <=> operator that does exactly what you want, but it's not available in C.)

You'll need to evaluate 1 or 2 comparisons to compare the a values, plus 1 or 2 comparisons to compare the b values, for a total of up to 4 comparisons. You can write code that performs them more tersely, but it's going to be essentially equivalent to what you've already written.

Here's one possible slightly tricky solution:

int cmp(T t1, T t2) {
    return ((t2.a > t1.a) - (t2.a < t1.a)) || ((t2.b > t1.b) - (t2.b < t1.b));
}

I'd split it up like this:

int cmp(T t1, T t2) {
    return ((t2.a > t1.a) - (t2.a < t1.a))
           ||
           ((t2.b > t1.b) - (t2.b < t1.b));
}

The first half of the expression yields 0 if t1.a and t2.a are equal, -1 if t1.a < t2.a, and +1 if t1.a > t2.a. It depends on the fact that the relational operators always return either 0 or 1.

If the first half is either -1 or +1, the || short-circuits, and we're done; otherwise it goes on to compare the t1.b vs. t2.b.

This might actually be slightly less efficient than the code in your question since it always evaluates both t2.a > t1.a and t2.a < t1.a.

Incidentally, that's not a valid qsort comparison function. Such a function must take two const void* arguments. It can be written like this:

int cmp(const void *arg1, const void *arg2) {
    const T *t1 = arg1;
    const T *t2 = arg2;
    return ((t2->a > t1->a) - (t2->a < t1->a))
           ||
           ((t2->b > t1->b) - (t2->b < t1->b));
}
like image 23
Keith Thompson Avatar answered Oct 15 '22 06:10

Keith Thompson