Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why do (only) some compilers use the same address for identical string literals?

https://godbolt.org/z/cyBiWY

I can see two 'some' literals in assembler code generated by MSVC, but only one with clang and gcc. This leads to totally different results of code execution.

static const char *A = "some"; static const char *B = "some";  void f() {     if (A == B) {         throw "Hello, string merging!";     } } 

Can anyone explain the difference and similarities between those compilation outputs? Why does clang/gcc optimize something even when no optimizations are requested? Is this some kind of undefined behaviour?

I also notice that if I change the declarations to those shown below, clang/gcc/msvc do not leave any "some" in the assembler code at all. Why is the behaviour different?

static const char A[] = "some"; static const char B[] = "some"; 
like image 937
Eugene Kosov Avatar asked Oct 15 '18 10:10

Eugene Kosov


People also ask

What is special about a string literal?

A "string literal" is a sequence of characters from the source character set enclosed in double quotation marks (" "). String literals are used to represent a sequence of characters which, taken together, form a null-terminated string. You must always prefix wide-string literals with the letter L.

What are the two types of string literals?

A string literal with the prefix L is a wide string literal. A string literal without the prefix L is an ordinary or narrow string literal. The type of narrow string literal is array of char . The type of a wide character string literal is array of wchar_t Both types have static storage duration.

How is a string literal stored in the memory?

The characters of a literal string are stored in order at contiguous memory locations. An escape sequence (such as \\ or \") within a string literal counts as a single character. A null character (represented by the \0 escape sequence) is automatically appended to, and marks the end of, each string literal.

Where are string literals stored C?

String literals are stored in C as an array of chars, terminted by a null byte. A null byte is a char having a value of exactly zero, noted as '\0'.


2 Answers

This is not undefined behavior, but unspecified behavior. For string literals,

The compiler is allowed, but not required, to combine storage for equal or overlapping string literals. That means that identical string literals may or may not compare equal when compared by pointer.

That means the result of A == B might be true or false, on which you shouldn't depend.

From the standard, [lex.string]/16:

Whether all string literals are distinct (that is, are stored in nonoverlapping objects) and whether successive evaluations of a string-literal yield the same or a different object is unspecified.

like image 78
songyuanyao Avatar answered Oct 12 '22 18:10

songyuanyao


The other answers explained why you cannot expect the pointer addresses to be different. Yet you can easily rewrite this in a way that guarantees that A and B don't compare equal:

static const char A[] = "same"; static const char B[] = "same";// but different  void f() {     if (A == B) {         throw "Hello, string merging!";     } } 

The difference being that A and B are now arrays of characters. This means that they aren't pointers and their addresses have to be distinct just like those of two integer variables would have to be. C++ confuses this because it makes pointers and arrays seem interchangeable (operator* and operator[] seem to behave the same), but they are really different. E.g. something like const char *A = "foo"; A++; is perfectly legal, but const char A[] = "bar"; A++; isn't.

One way to think about the difference is that char A[] = "..." says "give me a block of memory and fill it with the characters ... followed by \0", whereas char *A= "..." says "give me an address at which I can find the characters ... followed by \0".

like image 42
tobi_s Avatar answered Oct 12 '22 17:10

tobi_s