"strlen(s1) - strlen(s2)" is never less than zero

Tags:

I am currently writing a C program that requires frequent comparisons of string lengths so I wrote the following helper function:

int strlonger(char *s1, char *s2) {     return strlen(s1) - strlen(s2) > 0; }

I have noticed that the function returns true even when s1 has shorter length than s2. Can someone please explain this strange behavior?

572

asked May 06 '12 22:05

Adrian Monk

2 Answers

What you've come across is some peculiar behavior that arises in C when handling expressions that contain both signed and unsigned quantities.

When an operation is performed where one operand is signed and the other is unsigned, C will implicitly convert the signed argument to unsigned and perform the operations assuming the numbers are nonnegative. This convention often leads to nonintuitive behavior for relational operators such as < and >.

Regarding your helper function, note that since strlen returns type size_t (an unsigned quantity), the difference and the comparison are both computed using unsigned arithmetic. When s1 is shorter than s2, the difference strlen(s1) - strlen(s2) should be negative, but instead becomes a large, unsigned number, which is greater than 0. Thus,

return strlen(s1) - strlen(s2) > 0;

returns 1 even if s1 is shorter than s2. To fix your function, use this code instead:

return strlen(s1) > strlen(s2);

Welcome to the wonderful world of C! :)

Additional Examples

Since this question has recently received a lot of attention, I'd like to provide a few (simple) examples, just to ensure that I am getting the idea across. I will assume that we are working with a 32-bit machine using two's complement representation.

The important concept to understand when working with unsigned/signed variables in C is that if there is a mix of unsigned and signed quantities in a single expression, signed values are implicitly cast to unsigned.

Example #1:

Consider the following expression:

-1 < 0U

Since the second operand is unsigned, the first one is implicitly cast to unsigned, and hence the expression is equivalent to the comparison,

4294967295U < 0U

which of course is false. This is probably not the behavior you were expecting.

Example #2:

Consider the following code that attempts to sum the elements of an array a, where the number of elements is given by parameter length:

int sum_array_elements(int a[], unsigned length) {     int i;     int result = 0;      for (i = 0; i <= length-1; i++)          result += a[i];      return result; }

This function is designed to demonstrate how easily bugs can arise due to implicit casting from signed to unsigned. It seems quite natural to pass parameter length as unsigned; after all, who would ever want to use a negative length? The stopping criterion i <= length-1 also seems quite intuitive. However, when run with argument length equal to 0, the combination of these two yields an unexpected outcome.

Since parameter length is unsigned, the computation 0-1 is performed using unsigned arithmetic, which is equivalent to modular addition. The result is then UMax. The <= comparison is also performed using an unsigned comparison, and since any number is less than or equal to UMax, the comparison always holds. Thus, the code will attempt to access invalid elements of array a.

The code can be fixed either by declaring length to be an int, or by changing the test of the for loop to be i < length.

Conclusion: When Should You Use Unsigned?

I don't want to state anything too controversial here, but here are some of the rules I often adhere to when I write programs in C.

DON'T use just because a number is nonnegative. It is easy to make mistakes, and these mistakes are sometimes incredibly subtle (as illustrated in Example #2).
DO use when performing modular arithmetic.
DO use when using bits to represent sets. This is often convenient because it allows you to perform logical right shifts without sign extension.

Of course, there may be situations in which you decide to go against these "rules". But most often than not, following these suggestions will make your code easier to work with and less error-prone.

answered Nov 02 '22 02:11

Alex Lockwood

strlen returns a size_t which is a typedef for an unsigned type.

So,

(unsigned) 4 - (unsigned) 7 == (unsigned) - 3

All unsigned values are greater than or equal to 0. Try converting the variables returned by strlen to long int.

answered Nov 02 '22 04:11

pmg

Related questions
                            
                                Why does this program print "forked!" 4 times?
                            
                                When is an integer<->pointer cast actually correct?
                            
                                What does EAGAIN mean?
                            
                                Array size at run time without dynamic allocation is allowed?
                            
                                How do you introduce unit testing into a large, legacy (C/C++) codebase?
                            
                                Why are there digraphs in C and C++?
                            
                                increment value of int being pointed to by pointer
                            
                                Why does the order of '-l' option in gcc matter? [duplicate]
                            
                                What does the registerNatives() method do?
                            
                                What is the equivalent to getch() & getche() in Linux?
                            
                                What's the use of suffix `f` on float value
                            
                                C++ - include unistd.h: why not cunistd?
                            
                                How does fread really work?
                            
                                What are C macros useful for?
                            
                                Is it possible to tell the branch predictor how likely it is to follow the branch?
                            
                                Finding the length of a Character Array in C
                            
                                Does using heap memory (malloc/new) create a non-deterministic program?
                            
                                Why would you use 'extern "C++"'?
                            
                                Socket accept - "Too many open files"
                            
                                Catch and compute overflow during multiplication of two large integers

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

"strlen(s1) - strlen(s2)" is never less than zero

Tags:

c

string

unsigned

debugging