Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does the compiler know the string literal (const char*) already exists in data memory?

Tags:

c++

c

I have learned that

char szA[] = "abc";

uses "abc" to initialize array szA, so it is stored in the stack memory and will be destroyed when function ends.

On the other hand, consider:

char * szB = "abc";

In here "abc" stored in the data memory section like static variables, and szB is just an address of it.

In this point, I was wonder:

If I try

int i = 0;
while(i++ < 1000000)
    char * szC = "hello"

will this make 1000000 of "hello" in data section?

To figure this out, I have written test code:

#include <iostream>

using namespace std;

char* testA(char* arr)
{
    return arr;
}

char* testB(char* arr)
{
    return arr;
}

void main()
{
    cout << "testA---------------\n";
    cout << int(testA("abc")) << endl;
    cout << int(testA("cba")) << endl;

    cout << "testB---------------\n";
    cout << int(testB("abc")) << endl;
    cout << int(testB("cba")) << endl;

    cout << "local---------------\n";
    char* pChA = "abc";
    cout << int(pChA) << endl;
    char* pChB = "cba";
    cout << int(pChB) << endl;
}

And the result is:

testA---------------
9542604
9542608
testB---------------
9542604
9542608
local---------------
9542604
9542608

So, apparently there is only one space for each string literal in data memory.

But how does the compiler know that the literal string(const char*) already exists in data memory?

like image 295
KID Avatar asked Feb 18 '18 07:02

KID


1 Answers

In some situations, string literals need to be translated to static arrays of characters. This happens at compile time. Your loop cannot allocate the static memory a million times; it's just not possible. A static variable can only be allocated once.

The compiler can allocate static memory for each string literal that it sees in the source code. The compiler may use the same static memory for identical string literals, so after char* p = "Hello"; char* q = "Hello"; p and q may be equal or not equal. The compiler may use the same static memory for the same sequence of bytes, so after char* p = "Hello"; char* q = "ello"; &p[1] and &q[0] may be equal or not equal.

How well the compiler does reusing the same static memory depends on the quality of the compiler. It can just keep track of all string literals, delaying code generation until it knows all string literals in a compilation unit, then combine equal strings to the same address, combine suffixes like "Hello" and "ello" and generate only the string literals that are needed.

Also, for something like sizeof ("Hello") or "Hello" [2] no static memory needs to be created at all. For pointer comparison, like p == "Hello" or "Hello" == "Hello", the compiler can just say the result is false without allocating memory.

like image 143
gnasher729 Avatar answered Oct 05 '22 04:10

gnasher729