I see lots of people use subtraction in a qsort comparator function. I think it is wrong because when dealing with these numbers: <code>int nums[]={-2147483648,1,2,3}; INT_MIN = -2147483648;</code> <pre class="prettyprint"><code>int compare (const void * a, const void * b) { return ( *(int*)a - *(int*)b ); } </code></pre> I wrote this function to test: <pre class="prettyprint"><code>#include <stdio.h> #include <limits.h> int compare (const void * a, const void * b) { return ( *(int*)a - *(int*)b ); } int main(void) { int a = 1; int b = INT_MIN; printf("%d %d\n", a,b); printf("%d\n",compare((void *)&a,(void *)&b)); return 0; } </code></pre> The output is: <pre class="prettyprint"><code>1 -2147483648 -2147483647 </code></pre> but <code>a > b</code> so the output should be positive。 I have seen many books write like this. I think it is wrong; it should be written like this when dealing with <code>int</code> types: <pre class="prettyprint"><code>int compare (const void * a, const void * b) { if(*(int *)a < *(int *)b) return -1; else if(*(int *)a > *(int *)b) return 1; else return 0; } </code></pre> I just cannot figure out why many books and web sites write in such a misleading way. If you have any different view, please let me know.

<blockquote> I think it is wrong </blockquote> Yes, a simple subtraction can lead to <code>int</code> overflow which is undefined behavior and should be avoided. <pre class="prettyprint"><code>return *(int*)a - *(int*)b; // Potential undefined behavior. </code></pre> A common idiom is to subtract two integer compares. Various compilers recognize this and create efficient well behaved code. Preserving <code>const</code>-ness also is good form. <pre class="prettyprint"><code>const int *ca = a; const int *cb = b; return (*ca > *cb) - (*ca < *cb); </code></pre> <blockquote> why many books and web sites write in such a misleading way. </blockquote> <code>return *a - *b;</code> is conceptually easy to digest - even if it provides the wrong answer with extreme values - often learner code omits edge conditions to get the idea across - "knowing" that values will never be large. Or consider the complexities of comparing <code>long doubles</code> with regard to NaN.

Your understanding is absolutely correct. This common idiom cannot be used for <code>int</code> values. Your proposed solution works correctly, although it would be more readable with local variables to avoid so many casts: <pre class="prettyprint"><code>int compare(const void *a, const void *b) { const int *aa = a; const int *bb = b; if (*aa < *bb) return -1; else if (*aa > *bb) return 1; else return 0; } </code></pre> Note that modern compilers will generate the same code with or without these local variables: always prefer the more readable form. A more compact solution with the same exact result is commonly used although a bit more difficult to understand: <pre class="prettyprint"><code>int compare(const void *a, const void *b) { const int *aa = a; const int *bb = b; return (*aa > *bb) - (*aa < *bb); } </code></pre> Note that this approach works for all numeric types, but will return <code>0</code> for NaN floating point values. As for your remark: I just cannot figure out why many books and web sites write in such a misleading way: <ul> <li>Many books and websites contain mistakes, and so do most programs. Many programming bugs get caught and squashed before they reach production if the program is tested wisely. Code fragments in books are not tested, and although they never reach production, the bugs they contain do propagate virally via unsuspecting readers who learn bogus methods and idioms. A very bad and lasting side effect. </li> <li>Kudos to you for catching this! You have a rare skill among programmers: you are a good reader. There are far more programmers who write code than programmers who can read code correctly and see mistakes. Hone this skill by reading other people's code, on stack overflow or from open source projects... And do report the bugs.</li> <li>The subtraction method is in common use, I have seen it in many places like you and it does work for most value pairs. This bug may go unnoticed for eons. A similar problem was latent in the zlib for decades: <code>int m = (a + b) / 2;</code> causes a fateful integer overflow for large <code>int</code> values of <code>a</code> and <code>b</code>.</li> <li>The author probably saw it used and thought the subtraction was cool and fast, worth showing in print.</li> <li>Note however that the erroneous function does work correctly for types smaller than <code>int</code>: <code>signed</code> or <code>unsigned</code> <code>char</code> and <code>short</code>, if these types are indeed smaller than <code>int</code> on the target platform, which the C Standard does not mandate.</li> <li>Indeed similar code can be found in The C Programming Language by Brian Kernighan and Dennis Ritchie, the famous K&R C bible by its inventors. They use this approach in a simplistic implementation of <code>strcmp()</code> in chapter 5. The code in the book is dated, going all the way back to the late seventies. Although it has implementation defined behavior, it does not invoke undefined behavior in any but the rarest architectures among which the infamous DeathStation-9000, yet it should not be used to compare <code>int</code> values.</li> </ul>

qsort function compare confused me

I see lots of people use subtraction in a qsort comparator function. I think it is wrong because when dealing with these numbers: int nums[]={-2147483648,1,2,3}; INT_MIN = -2147483648;

int compare (const void * a, const void * b)
{
  return ( *(int*)a - *(int*)b );
}

I wrote this function to test:

#include <stdio.h>
#include <limits.h>

int compare (const void * a, const void * b)
{
    return ( *(int*)a - *(int*)b );
}

int main(void)
{
    int a = 1;
    int b = INT_MIN;
    printf("%d %d\n", a,b);
    printf("%d\n",compare((void *)&a,(void *)&b));
    return 0;
}

The output is:

1 -2147483648
-2147483647

but a > b so the output should be positive。 I have seen many books write like this. I think it is wrong; it should be written like this when dealing with int types:

int compare (const void * a, const void * b)
{
    if(*(int *)a < *(int *)b)
        return -1;
    else if(*(int *)a > *(int *)b)
        return 1;
    else 
        return 0;
}

I just cannot figure out why many books and web sites write in such a misleading way. If you have any different view, please let me know.

How does Q sort work?

qsort will give each pair it needs to compare to cmpfunc and uses it's return value to see which one is greater and then sorts the array accordingly. Basically, if the compare function returns a positive result this means that the first argument is greater than the second one. Similarly, if it returns negative, then the second argument is greater.

What is the difference between CMP and Q sort in C++?

The cmp function returns a negative integer value if the first argument is less than the second, a positive integer value if the first argument is greater than the second and zero if the arguments are equal. The function takes void pointers so the qsort function may be used with any data type.

How does the comparison function work in a sorting algorithm?

(At a minimum.) See stackoverflow.com/questions/53821538/… for someone who encountered problems when they tried to usr it. Whenever the sorting algorithm needs to find out which of two elements should be placed before the other, it will call the comparison function and pass pointers to the two elements.

How to write comparator function of quick sort?

However, there exists an interesting approach with a little modification in comparator function of Quick Sort. The idea is to write a comparator function that takes two addresses p and q as arguments. Let l and r be the number pointed by p and q.

I think it is wrong

Yes, a simple subtraction can lead to int overflow which is undefined behavior and should be avoided.

return *(int*)a - *(int*)b;  // Potential undefined behavior.

A common idiom is to subtract two integer compares. Various compilers recognize this and create efficient well behaved code. Preserving const-ness also is good form.

const int *ca = a;
const int *cb = b;
return (*ca > *cb) - (*ca < *cb);

why many books and web sites write in such a misleading way.

return *a - *b; is conceptually easy to digest - even if it provides the wrong answer with extreme values - often learner code omits edge conditions to get the idea across - "knowing" that values will never be large.

Or consider the complexities of comparing long doubles with regard to NaN.

Your understanding is absolutely correct. This common idiom cannot be used for int values.

Your proposed solution works correctly, although it would be more readable with local variables to avoid so many casts:

int compare(const void *a, const void *b) {
    const int *aa = a;
    const int *bb = b;
    if (*aa < *bb)
        return -1;
    else if (*aa > *bb)
        return 1;
    else 
        return 0;
}

Note that modern compilers will generate the same code with or without these local variables: always prefer the more readable form.

A more compact solution with the same exact result is commonly used although a bit more difficult to understand:

int compare(const void *a, const void *b) {
    const int *aa = a;
    const int *bb = b;
    return (*aa > *bb) - (*aa < *bb);
}

Note that this approach works for all numeric types, but will return 0 for NaN floating point values.

As for your remark: I just cannot figure out why many books and web sites write in such a misleading way:

Many books and websites contain mistakes, and so do most programs. Many programming bugs get caught and squashed before they reach production if the program is tested wisely. Code fragments in books are not tested, and although they never reach production, the bugs they contain do propagate virally via unsuspecting readers who learn bogus methods and idioms. A very bad and lasting side effect.
Kudos to you for catching this! You have a rare skill among programmers: you are a good reader. There are far more programmers who write code than programmers who can read code correctly and see mistakes. Hone this skill by reading other people's code, on stack overflow or from open source projects... And do report the bugs.
The subtraction method is in common use, I have seen it in many places like you and it does work for most value pairs. This bug may go unnoticed for eons. A similar problem was latent in the zlib for decades: int m = (a + b) / 2; causes a fateful integer overflow for large int values of a and b.
The author probably saw it used and thought the subtraction was cool and fast, worth showing in print.
Note however that the erroneous function does work correctly for types smaller than int: signed or unsigned char and short, if these types are indeed smaller than int on the target platform, which the C Standard does not mandate.
Indeed similar code can be found in The C Programming Language by Brian Kernighan and Dennis Ritchie, the famous K&R C bible by its inventors. They use this approach in a simplistic implementation of strcmp() in chapter 5. The code in the book is dated, going all the way back to the late seventies. Although it has implementation defined behavior, it does not invoke undefined behavior in any but the rarest architectures among which the infamous DeathStation-9000, yet it should not be used to compare int values.

qsort function compare confused me

Tags:

c

qsort

52coder

People also ask

2 Answers

chux - Reinstate Monica

chqrlie

Recent Activity

Donate For Us

qsort function compare confused me

Tags:

c

qsort

52coder

People also ask

2 Answers

chux - Reinstate Monica

chqrlie

Related questions

Recent Activity

Donate For Us