I read that string literals are always stored in read only memory and it makes sense as to why.
However if I initialize a character array using a string literal, it still stores the string literal in read only memory and then copies it into the memory location of the character array.
My question is, in this scenario, why bother storing the string literal in read only memory in the first place, why not directly store it in the memory location of character array.
I read that string literals are always stored in read only memory and it makes sense as to why.
The storage location of string literals is implementation-defined. If compilers decide to emit a large string literal, it will usually be located in a read-only section of static memory, such as .rodata.
However, whether this is even necessary is up to the compiler. Compilers are allowed to optimize your code according to the as-if rule, so if the behavior of the program is the same with the literal being stored elsewhere, or nowhere at all, that is also allowed.
int sum() {
    char arr[] = "ab";
    return arr[0] + arr[1];
}
With the following assembly output:
sum():
     mov eax, 195
     ret
In this case, because everything is a compile-time constant, there is no string literal or array at all. The compiler optimized it away and turned our code into return 195; by summing up the two ASCII characters a and b.
void consume(const char*);
void short_string() {
    char arr[] = "short str";
    consume(arr);
}
short_string():
        sub     rsp, 24
        movabs  rax, 8391086215229565043
        mov     qword ptr [rsp + 8], rax
        mov     word ptr [rsp + 16], 114
        lea     rdi, [rsp + 8]
        call    consume(char const*)@PLT
        add     rsp, 24
        ret
Once again, no code was emitted that would keep the string in read-only memory, but it also wasn't away optimized completely. The compiler sees that the string short str is very short, so it treats its ASCII bytes as a number 8391086215229565043 and directly movs its memory onto the stack. consume() is called with a pointer to stack memory.
void long_string() {
    char arr[] = "Lorem ipsum dolor [...] est laborum.";
    consume(arr);
}
long_string():
        push    rbx
        sub     rsp, 448
        lea     rsi, [rip + .L__const.long_string().arr]
        mov     rbx, rsp
        mov     edx, 446
        mov     rdi, rbx
        call    memcpy@PLT
        mov     rdi, rbx
        call    consume(char const*)@PLT
        add     rsp, 448
        pop     rbx
        ret
.L__const.long_string().arr:
        .asciz  "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum."
Our string is now much too long to be treated as a number or two. The entire string will now be emitted into static memory, most likely .rodata after linking. It is still helpful for it to exist, because we can use memcpy to copy it from static memory onto the stack when initializing arr.
If you're worried about compilers doing something wasteful here, don't be. Modern compilers are very good at optimizing and deciding which symbols go where, and if they emit a string literal, this is usually because it must exist for some other code to work, or because it makes initialization of an array easier.
See live examples with Compiler Explorer
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With