Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is storage for the same content string literals guaranteed to be the same?

Is the code below safe? It might be tempting to write code akin to this:

#include <map>

const std::map<const char*, int> m = {
    {"text1", 1},
    {"text2", 2}
};

int main () {
    volatile const auto a = m.at("text1");
    return 0;
}

The map is intended to be used with string literals only.

I think it's perfectly legal and seems to be working, however I never saw a guarantee that the pointer for the literal used in two different places to be the same. I couldn't manage to make compiler generate two separate pointers for literals with the same content, so I started to wonder how firm the assumption is.

I am only interested whether the literals with same content can have different pointers. Or more formally, can the code above except?

I know that there's a way to write code to be sure it works, and I think above approach is dangerous because compiler could decide to assign two different storages for the literal, especially if they are placed in different translation units. Am I right?

like image 433
luk32 Avatar asked Sep 20 '18 11:09

luk32


4 Answers

Whether or not two string literals with the exact same content are the exact same object, is unspecified, and in my opinion best not relied upon. To quote the standard:

[lex.string]

16 Evaluating a string-literal results in a string literal object with static storage duration, initialized from the given characters as specified above. Whether all string literals are distinct (that is, are stored in nonoverlapping objects) and whether successive evaluations of a string-literal yield the same or a different object is unspecified.

If you wish to avoid the overhead of std::string, you can write a simple view type (or use std::string_view in C++17) that is a reference type over a string literal. Use it to do intelligent comparisons instead of relying upon literal identity.

like image 135
StoryTeller - Unslander Monica Avatar answered Nov 16 '22 04:11

StoryTeller - Unslander Monica


The Standard does not guarantee the addresses of string literals with the same content will be the same. In fact, [lex.string]/16 says:

Whether all string literals are distinct (that is, are stored in nonoverlapping objects) and whether successive evaluations of a string-literal yield the same or a different object is unspecified.

The second part even says you might not get the same address when a function containing a string literal is called a second time! Though I've never seen a compiler do that.

So using the same character array object when a string literal is repeated is an optional compiler optimization. With my installation of g++ and default compiler flags, I also find I get the same address for two identical string literals in the same translation unit. But as you guessed, I get different ones if the same string literal content appears in different translation units.


A related interesting point: it's also permitted for different string literals to use overlapping arrays. That is, given

const char* abcdef = "abcdef";
const char* def = "def";
const char* def0gh = "def\0gh";

it's possible you might find abcdef+3, def, and def0gh are all the same pointer.

Also, this rule about reusing or overlapping string literal objects applies only to the unnamed array object directly associated with the literal, used if the literal immediately decays to a pointer or is bound to a reference to array. A literal can also be used to initialize a named array, as in

const char a1[] = "XYZ";
const char a2[] = "XYZ";
const char a3[] = "Z";

Here the array objects a1, a2 and a3 are initialized using the literal, but are considered distinct from the actual literal storage (if such storage even exists) and follow the ordinary object rules, so the storage for those arrays will not overlap.

like image 40
aschepler Avatar answered Nov 16 '22 03:11

aschepler


No, the C++ standard makes no such guarantees.

That said, if the code is in the same translation unit then it would be difficult to find a counter example. If main() is in a different translation then a counter example might be easier to produce.

If the map is in a different dynamic linked library or shared object then it's almost certainly not the case.

The volatile qualifier is a red herring.

like image 5
Bathsheba Avatar answered Nov 16 '22 04:11

Bathsheba


The C++ standard does not require an implementation to de-duplicate string literals.

When a string literal resides in another translation unit or another shared library that would require the linker (ld) or runtime-linker (ld.so) to do the string literal de-duplication. Which they don't.

like image 3
Maxim Egorushkin Avatar answered Nov 16 '22 02:11

Maxim Egorushkin