Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

When consolidating duplicate literals, will a C compiler look in the middle of a string?

I have a lot of literal strings in my source code that are otherwise identical except for leading white-spaces (due to a desire to maintain correct indentation). Are compilers smart enough to see that it can reuse the space in memory for both, just offsetting one string by a couple bytes?

like image 685
cleong Avatar asked Nov 26 '12 10:11

cleong


People also ask

Where do string literals get stored in C?

String literals are stored in C as an array of chars, terminted by a null byte. A null byte is a char having a value of exactly zero, noted as '\0'. Do not confuse the null byte, '\0', with the character '0', the integer 0, the double 0.0, or the pointer NULL.

Where does string literals get stored?

The characters of a literal string are stored in order at contiguous memory locations. An escape sequence (such as \\ or \") within a string literal counts as a single character.

Can string literals be changed in C?

The only difference is that you cannot modify string literals, whereas you can modify arrays. Functions that take a C-style string will be just as happy to accept string literals unless they modify the string (in which case your program will crash).

Are string literals stored on stack?

String literals are not stored in the heap or the stack, they are stored directly in your program's binary. Literally embedded in the binary, and the reference is a reference to the location in the binary. They're in a section of your program's binary.


2 Answers

ISO c99 6.5.2.5 Compound literals

83) This allows implementations to share storage for string literals and constant compound literals with the same or overlapping representations.

like image 137
Omkant Avatar answered Nov 15 '22 08:11

Omkant


Short answer: probably.

Long answer: it depends on the implementation. Typically, C compilers have an optimizer feature called "string pool" or similar, which enables the compiler to store all string literals adjacently in ROM.

The contents of that string pool may then be optimized, the very same string appearing twice will almost certainly get optimized out. I think that most compilers will also be smart enough to recognize sub strings. But there are also platform considerations such as alignment, so just because there exists a sub string, it doesn't necessarily mean that it will be most effective to re-use that memory location.

There is nothing in the C standard that guarantees that such optimizations are done. But at the same time, there is nothing in the standard preventing it either.

To be sure, you have to check your specific compiler's documentation, or disassemble your program, or check the linker output.

like image 34
Lundin Avatar answered Nov 15 '22 07:11

Lundin