What is the implementation reason behind the following char array implementation? <pre class="prettyprint"><code>char *ch1 = "Hello"; // Read-only data /* if we try ch1[1] = ch1[2]; we will get **Seg fault** since the value is stored in the constant code segment */ char ch2[] = "World"; // Read-write data /* if we try ch2[1] = ch2[2]; will work. */ </code></pre> According to the book Head first C (page 73,74), the <code>ch2[]</code> array is stored both in constant code segment but also in the function stack. What is the reason behind duplicating both in code and stack memory space? Why the value can be kept only in stack if it is not read-only data?

First, let's clear something up. String literals are not necessarily read-only data, it's just that it's undefined behaviour to try and change them. It doesn't necessarily have to crash, it may work just fine. But, being undefined behaviour, you shouldn't rely on it if you want you code to run in another implementation, another version of the same implementation, or even next Wednesday. This may well stem from a time before standards were in place (the original ANSI/ISO mandate was to codify existing practice rather than create a new language). In many implementations, strings would share space for efficiency, such as the code: <pre class="prettyprint"><code>char *good = "successful"; char *bad = "unsuccessful"; </code></pre> resulting in: <pre class="prettyprint"><code>good---------+ bad--+ | | | V V | u | n | s | u | c | c | e | s | s | f | u | l | \0 | </code></pre> Hence, if you changed one of the characters in <code>good</code>, it would also change <code>bad</code>. The reason you can do it with something like: <pre class="prettyprint"><code>char indifferent[] = "meh"; </code></pre> is that, while <code>good</code> and <code>bad</code> point to a string literal, that statement actually creates a character array big enough to hold <code>"meh"</code> and then copies the data into it1. The copy of the data can be freely changed. In fact the C99 rationale document explicitly cites this as one of the reasons: <blockquote> String literals are not required to be modifiable. This specification allows implementations to share copies of strings with identical text, to place string literals in read-only memory, and to perform certain optimizations. </blockquote> But regardless as to why, the standard is quite clear on the what. From C11 <code>6.4.5 String literals</code>: <blockquote> 7/ It is unspecified whether these arrays are distinct provided their elements have the appropriate values. If the program attempts to modify such an array, the behavior is undefined. </blockquote> For the latter case, this is covered in <code>6.7.6 Declarators</code> and <code>6.7.9 Initialisation</code>. <hr> 1 Though it's worth noting the the normal "as if" rules apply here (as long as an implementation acts as if it's following the standard, it can do what it pleases). In other words, if the implementation can detect that you never try to change the data, it can quite happily bypass the copy and use the original.

What is the reason behind the following C char array storage implementation?

Tags:

c

language-design

What is the implementation reason behind the following char array implementation?

char *ch1 = "Hello"; // Read-only data
/* if we try ch1[1] = ch1[2]; 
we will get **Seg fault** since the value is stored in 
the constant code segment */

char ch2[] = "World"; // Read-write data
/* if we try ch2[1] = ch2[2]; will work. */

According to the book Head first C (page 73,74), the ch2[] array is stored both in constant code segment but also in the function stack. What is the reason behind duplicating both in code and stack memory space? Why the value can be kept only in stack if it is not read-only data?

780

asked Aug 20 '15 06:08

Ashwin

1 Answers

First, let's clear something up. String literals are not necessarily read-only data, it's just that it's undefined behaviour to try and change them.

It doesn't necessarily have to crash, it may work just fine. But, being undefined behaviour, you shouldn't rely on it if you want you code to run in another implementation, another version of the same implementation, or even next Wednesday.

This may well stem from a time before standards were in place (the original ANSI/ISO mandate was to codify existing practice rather than create a new language). In many implementations, strings would share space for efficiency, such as the code:

char *good = "successful";
char *bad = "unsuccessful";

resulting in:

good---------+
bad--+       |
     |       |
     V       V
   | u | n | s | u | c | c | e | s | s | f | u | l | \0 |

Hence, if you changed one of the characters in good, it would also change bad.

The reason you can do it with something like:

char indifferent[] = "meh";

is that, while good and bad point to a string literal, that statement actually creates a character array big enough to hold "meh" and then copies the data into it¹. The copy of the data can be freely changed.

In fact the C99 rationale document explicitly cites this as one of the reasons:

String literals are not required to be modifiable. This specification allows implementations to share copies of strings with identical text, to place string literals in read-only memory, and to perform certain optimizations.

But regardless as to why, the standard is quite clear on the what. From C11 6.4.5 String literals:

7/ It is unspecified whether these arrays are distinct provided their elements have the appropriate values. If the program attempts to modify such an array, the behavior is undefined.

For the latter case, this is covered in 6.7.6 Declarators and 6.7.9 Initialisation.

¹ Though it's worth noting the the normal "as if" rules apply here (as long as an implementation acts as if it's following the standard, it can do what it pleases).

In other words, if the implementation can detect that you never try to change the data, it can quite happily bypass the copy and use the original.

answered Sep 27 '22 23:09

paxdiablo

Related questions
                            
                                what to do with missing libgcc_s.a
                            
                                How to filter and intercept Linux packets by using net_dev_add() API?
                            
                                Difference between sockaddr and sockaddr_storage
                            
                                Why does size always = 4096 in Linux character driver read call?
                            
                                Why Is ACCESS_ONCE so complex?
                            
                                What are pthread cancelation points used for?
                            
                                Why use define keyword to define a function
                            
                                likely(x) and __builtin_expect((x),1)
                            
                                extern declaration, T* v/s T[]
                            
                                Checking the stdin buffer if it's empty
                            
                                How does copy-on-write work in fork()?
                            
                                gcc canaries : undefined reference to __stack_chk_guard
                            
                                Comparing long long with 0
                            
                                Why is the size of packed struct 5 instead of 4 bytes here?
                            
                                On Linux, in C, how can I get all threads of a process?
                            
                                C: Signal code: Address not mapped (1) mpirecv
                            
                                Declaring a structure: typedef struct name name;
                            
                                Subtract Signed integer from Unsigned integer [duplicate]
                            
                                Handle a char array returned from a function in C
                            
                                Catching libc error messages, redirecting from /dev/tty [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With