Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can a string literal in C be modified?

I recently had a question, I know that a pointer to a constant array initialized as it is in the code below, is in the .rodata region and that this region is only readable. However, I saw in pattern C11, that writing in this memory address behavior will be undefined. I was aware that the Borland's Turbo-C compiler can write where the pointer points, this would be because the processor operated in real mode on some systems of the time, such as MS-DOS? Or is it independent of the operating mode of the processor? Is there any other compiler that writes to the pointer and does not take any memory breach failure using the processor in protected mode?

#include <stdio.h>

int main(void) {
    char *st = "aaa";
    *st = 'b'; 
    return 0;
}

In this code compiling with Turbo-C in MS-DOS, you will be able to write to memory

like image 724
Yuri Albuquerque Avatar asked Jun 24 '19 20:06

Yuri Albuquerque


4 Answers

As has been pointed out, trying to modify a constant string in C results in undefined behavior. There are several reasons for this.

One reason is that the string may be placed in read-only memory. This allows it to be shared across multiple instances of the same program, and doesn't require the memory to be saved to disk if the page it's on is paged out (since the page is read-only and thus can be reloaded later from the executable). It also helps detect run-time errors by giving an error (e.g. a segmentation fault) if an attempt is made to modify it.

Another reason is that the string may be shared. Many compilers (e.g., gcc) will notice when the same literal string appears more than once in a compilation unit, and will share the same storage for it. So if a program modifies one instance, it could affect others as well.

There is also never a need to do this, since the same intended effect can easily be achieved by using a static character array. For instance:

#include <stdio.h>

int main(void) {
    static char st_arr[] = "aaa";
    char *st = st_arr;
    *st = 'b'; 
    return 0;
}

This does exactly what the posted code attempted to do, but without any undefined behavior. It also takes the same amount of memory. In this example, the string "aaa" is used as an array initializer, and does not have any storage of its own. The array st_arr takes the place of the constant string from the original example, but (1) it will not be placed in read-only memory, and (2) it will not be shared with any other references to the string. So it's safe to modify it, if in fact that's what you want.

like image 200
Tom Karzes Avatar answered Sep 28 '22 05:09

Tom Karzes


Is there any other compiler that writes to the pointer and does not take any memory breach failure using the processor in protected mode?

GCC 3 and earlier used to support gcc -fwriteable-strings to let you compile old K&R C where this was apparently legal, according to https://gcc.gnu.org/onlinedocs/gcc-3.3.6/gcc/Incompatibilities.html. (It's undefined behaviour in ISO C and thus a bug in an ISO C program). That option will define the behaviour of the assignment which ISO C leaves undefined.

GCC 3.3.6 manual - C Dialect options

-fwritable-strings
Store string constants in the writable data segment and don't uniquize them. This is for compatibility with old programs which assume they can write into string constants.

Writing into string constants is a very bad idea; “constants” should be constant.

GCC 4.0 removed that option (release notes); the last GCC3 series was gcc3.4.6 in March 2006. Although apparently it had become buggy in that version.

gcc -fwritable-strings would treat string literals like non-const anonymous character arrays (see @gnasher's answer), so they go in the .data section instead of .rodata, and thus get linked into a segment of the executable that's mapped to read+write pages, not read-only. (Executable segments have basically nothing to do with x86 segmentation, it's just a start+range memory-mapping from the executable file to memory.)

And it would disable duplicate-string merging, so char *foo() { return "hello"; } and char *bar() { return "hello"; } would return different pointer values, instead of merging identical string literals.


Related:

  • How can some GCC compilers modify a constant char pointer?

  • https://softwareengineering.stackexchange.com/questions/294748/why-are-c-string-literals-read-only


Linker option: still Undefined Behaviour so probably not viable

On GNU/Linux, linking with ld -N (--omagic) will make the text (as well as data) section read+write. This may apply to .rodata even though modern GNU Binutils ld puts .rodata in its own section (normally with read but not exec permission) instead of making it part of .text. Having .text writeable could easily be a security problem: you never want a page with write+exec at the same time, otherwise some bugs like buffer overflows can turn into code-injection attacks.

To do this from gcc, use gcc -Wl,-N to pass on that option to ld when linking.

This doesn't do anything about it being Undefined Behaviour to write const objects. e.g. the compiler will still merge duplicate strings, so writing into one char *foo = "hello"; will affect all other uses of "hello" in the whole program, even across files.


What to use instead:

If you want something writeable, use static char foo[] = "hello"; where the quoted string is just an array initializer for a non-const array. As a bonus, this is more efficient than static char *foo = "hello"; at global scope, because there's one fewer level of indirection to get to the data: it's just an array instead a pointer stored in memory.

like image 41
Peter Cordes Avatar answered Sep 28 '22 07:09

Peter Cordes


You are asking whether or not the platform may cause undefined behavior to be defined. The answer to that question is yes.

But you are also asking whether or not the platform defines this behavior. In fact it does not.

Under some optimization hints, the compiler will merge string constants, so that writing to one constant will write to the other uses of that constant. I used this compiler once, it was quite capable of merging strings.

Don't write this code. It's not good. You will regret writing code in this style when you move onto a more modern platform.

like image 41
Joshua Avatar answered Sep 28 '22 07:09

Joshua


Your literal "aaa" produces a static array of four const char 'a', 'a', 'a', '\0' in an anonymous location and returns a pointer to the first 'a', cast to char*.

Trying to modify any of the four characters is undefined behaviour. Undefined behaviour can do anything, from modifying the char as intended, pretending to modify the char, doing nothing, or crashing.

It's basically the same as static const char anonymous[4] = { 'a', 'a', 'a', '\0' }; char* st = (char*) &anonymous [0];

like image 31
gnasher729 Avatar answered Sep 28 '22 06:09

gnasher729