I see lots of people use subtraction in a qsort comparator function. I think it is wrong because when dealing with these numbers: int nums[]={-2147483648,1,2,3}; INT_MIN = -2147483648;
int compare (const void * a, const void * b)
{
return ( *(int*)a - *(int*)b );
}
I wrote this function to test:
#include <stdio.h>
#include <limits.h>
int compare (const void * a, const void * b)
{
return ( *(int*)a - *(int*)b );
}
int main(void)
{
int a = 1;
int b = INT_MIN;
printf("%d %d\n", a,b);
printf("%d\n",compare((void *)&a,(void *)&b));
return 0;
}
The output is:
1 -2147483648
-2147483647
but a > b
so the output should be positive。
I have seen many books write like this. I think it is wrong; it should be written like this when dealing with int
types:
int compare (const void * a, const void * b)
{
if(*(int *)a < *(int *)b)
return -1;
else if(*(int *)a > *(int *)b)
return 1;
else
return 0;
}
I just cannot figure out why many books and web sites write in such a misleading way. If you have any different view, please let me know.
qsort will give each pair it needs to compare to cmpfunc and uses it's return value to see which one is greater and then sorts the array accordingly. Basically, if the compare function returns a positive result this means that the first argument is greater than the second one. Similarly, if it returns negative, then the second argument is greater.
The cmp function returns a negative integer value if the first argument is less than the second, a positive integer value if the first argument is greater than the second and zero if the arguments are equal. The function takes void pointers so the qsort function may be used with any data type.
(At a minimum.) See stackoverflow.com/questions/53821538/… for someone who encountered problems when they tried to usr it. Whenever the sorting algorithm needs to find out which of two elements should be placed before the other, it will call the comparison function and pass pointers to the two elements.
However, there exists an interesting approach with a little modification in comparator function of Quick Sort. The idea is to write a comparator function that takes two addresses p and q as arguments. Let l and r be the number pointed by p and q.
I think it is wrong
Yes, a simple subtraction can lead to int
overflow which is undefined behavior and should be avoided.
return *(int*)a - *(int*)b; // Potential undefined behavior.
A common idiom is to subtract two integer compares. Various compilers recognize this and create efficient well behaved code. Preserving const
-ness also is good form.
const int *ca = a;
const int *cb = b;
return (*ca > *cb) - (*ca < *cb);
why many books and web sites write in such a misleading way.
return *a - *b;
is conceptually easy to digest - even if it provides the wrong answer with extreme values - often learner code omits edge conditions to get the idea across - "knowing" that values will never be large.
Or consider the complexities of comparing long doubles
with regard to NaN.
Your understanding is absolutely correct. This common idiom cannot be used for int
values.
Your proposed solution works correctly, although it would be more readable with local variables to avoid so many casts:
int compare(const void *a, const void *b) {
const int *aa = a;
const int *bb = b;
if (*aa < *bb)
return -1;
else if (*aa > *bb)
return 1;
else
return 0;
}
Note that modern compilers will generate the same code with or without these local variables: always prefer the more readable form.
A more compact solution with the same exact result is commonly used although a bit more difficult to understand:
int compare(const void *a, const void *b) {
const int *aa = a;
const int *bb = b;
return (*aa > *bb) - (*aa < *bb);
}
Note that this approach works for all numeric types, but will return 0
for NaN floating point values.
As for your remark: I just cannot figure out why many books and web sites write in such a misleading way:
Many books and websites contain mistakes, and so do most programs. Many programming bugs get caught and squashed before they reach production if the program is tested wisely. Code fragments in books are not tested, and although they never reach production, the bugs they contain do propagate virally via unsuspecting readers who learn bogus methods and idioms. A very bad and lasting side effect.
Kudos to you for catching this! You have a rare skill among programmers: you are a good reader. There are far more programmers who write code than programmers who can read code correctly and see mistakes. Hone this skill by reading other people's code, on stack overflow or from open source projects... And do report the bugs.
The subtraction method is in common use, I have seen it in many places like you and it does work for most value pairs. This bug may go unnoticed for eons. A similar problem was latent in the zlib for decades: int m = (a + b) / 2;
causes a fateful integer overflow for large int
values of a
and b
.
The author probably saw it used and thought the subtraction was cool and fast, worth showing in print.
Note however that the erroneous function does work correctly for types smaller than int
: signed
or unsigned
char
and short
, if these types are indeed smaller than int
on the target platform, which the C Standard does not mandate.
Indeed similar code can be found in The C Programming Language by Brian Kernighan and Dennis Ritchie, the famous K&R C bible by its inventors. They use this approach in a simplistic implementation of strcmp()
in chapter 5. The code in the book is dated, going all the way back to the late seventies. Although it has implementation defined behavior, it does not invoke undefined behavior in any but the rarest architectures among which the infamous DeathStation-9000, yet it should not be used to compare int
values.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With