Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Inconsistent strcmp() return value when passing strings as pointers or as literals

Tags:

c++

c

linux

strcmp

I was playing around with strcmp when I noticed this, here is the code:

#include <string.h> #include <stdio.h>  int main(){      //passing strings directly     printf("%d\n", strcmp("ahmad", "fatema"));      //passing strings as pointers      char *a= "ahmad";     char *b= "fatema";     printf("%d\n",strcmp(a,b));      return 0;  } 

the output is:

-1 -5 

shouldn't strcmp work the same? Why is it that I am given different value when I pass strings as "ahmad" or as char* a = "ahmad". When you pass values to a function they are allocated in its stack right?

like image 521
Ahmad AL-wazzan Avatar asked Jan 03 '15 02:01

Ahmad AL-wazzan


People also ask

Can you use strcmp with a string literal?

yes it is perfectly safe and considered standard practice. String literals are guaranteed to be properly null terminated.

What is the return type of strcmp () is?

The return value from strcmp is 0 if the two strings are equal, less than 0 if str1 compares less than str2 , and greater than 0 if str1 compares greater than str2 .

Does strcmp return true or false?

The strcmp function takes two input arguments (two strings) and returns either true or false, just like any boolean expression. Strcmp will only return true if every character of both strings is the same and they are the same length.

What is the use of strcmp () in string class?

The strcmp() function in C++ compares two null-terminating strings (C-strings). The comparison is done lexicographically. It is defined in the cstring header file.


2 Answers

You are most likely seeing the result of a compiler optimization. If we test the code using gcc on godbolt, with -O0 optimization level, we can see for the first case it does not call strcmp:

movl    $-1, %esi   #, movl    $.LC0, %edi #, movl    $0, %eax    #, call    printf  # 

Since your are using constants as arguments to strcmp the compiler is able for perform constant folding and call a compiler intrinsic at compile time and generate the -1 then, instead of having to call strcmp at run-time which is implemented in the standard library and will have a different implementation then a likely more simple compile time strcmp.

In the second case it does generate a call to strcmp:

call    strcmp  # movl    %eax, %esi  # D.2047, movl    $.LC0, %edi #, movl    $0, %eax    #, call    printf  # 

This is consistent with the fact that gcc has a builtin for strcmp, which is what gcc will use during constant folding.

If we further test using -O1 optimization level or greater gcc is able to fold both cases and the result will be -1 for both cases:

movl    $-1, %esi   #, movl    $.LC0, %edi #, xorl    %eax, %eax  # call    printf  # movl    $-1, %esi   #, movl    $.LC0, %edi #, xorl    %eax, %eax  # call    printf  # 

With more optimizations options turned on the optimizer is able to determine that a and b point to constants known at compile time as well and can also compute the result of strcmp for this case as well during compile time.

We can confirm that gcc is using builtin function by building with the -fno-builtin flag and observing that a call to strcmp will be generated for all cases.

clang is slightly different in that it does not fold at all using -O0 but will fold at -O1 and above for both.

Note, that any negative result is an entirely conformant, we can see by going to the draft C99 standard section 7.21.4.2 The strcmp function which says (emphasis mine):

int strcmp(const char *s1, const char *s2); 

The strcmp function returns an integer greater than, equal to, or less than zero, accordingly as the string pointed to by s1 is greater than, equal to, or less than the string pointed to by s2.

technosurus points out that strcmp is specified to treat the strings as if they were composed of unsigned char, this is covered in C99 under 7.21.1 which says:

For all functions in this subclause, each character shall be interpreted as if it had the type unsigned char (and therefore every possible object representation is valid and has a different value).

like image 171
Shafik Yaghmour Avatar answered Sep 18 '22 13:09

Shafik Yaghmour


I think you believe that the value returned by strcmp should somehow depend on the input strings passed to it in a way that is not defined by the function specification. This isn't correct. See for instance the POSIX definition:

http://pubs.opengroup.org/onlinepubs/009695399/functions/strcmp.html

Upon completion, strcmp() shall return an integer greater than, equal to, or less than 0, if the string pointed to by s1 is greater than, equal to, or less than the string pointed to by s2, respectively.

This is exactly what you are seeing. The implementation does not need to make any guarantee about the exact return value - only that is less than zero, equal to zero, or greater than zero as appropriate.

like image 38
davmac Avatar answered Sep 17 '22 13:09

davmac