Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What happens when a char array gets initialized from a string literal?

As I understand it, the following code works like so:

char* cptr = "Hello World";

"Hello World" lives in the .rodata section of the program's memory. The string literal "Hello World" returns a pointer to the base address of the string, or the address of the first element in the so-called "array", since the chars are laid out sequentially in memory it would be the 'H'. This is my little diagram as I visualize the string literal getting stored in the memory:

0x4 : 'H'
0x5 : 'e'
0x6 : 'l'
0x6 : 'l'
0x7 : 'o'
0x8 : ' '
0x9 : 'W'
0xa : 'o'
0xb : 'r'
0xc : 'l'
0xd : 'd'
0xe : '\0'

So the declaration above becomes:

char* cptr = 0x4;

Now cptr points to the string literal. I'm just making up the addresses.

0xa1 : 0x4

Now how does this code work?

char cString[] = "Hello World";

I am assuming that as in the previous situation "Hello World" also degrades to the address of 'H' and 0x4.

char cString[] = 0x4;

I am reading the = as an overloaded assignment operator when it used with initialization of a char array. As I understand, at initialization of C-string only, it copies char-by-char starting at the given base address into the C-string until it hits a '\0' as the last char copied. It also allocates enough memory for all the chars. Because overloaded operators are really just functions, I assume that it's internal implementation is similar to strcpy().

I would like one of the more experienced C programmers to confirm my assumptions of how this code works. This is my visualization of the C-string after the chars from the string literal get copied into it:

0xb4 : 'H'
0xb5 : 'e'
0xb6 : 'l'
0xb6 : 'l'
0xb7 : 'o'
0xb8 : ' '
0xb9 : 'W'
0xba : 'o'
0xbb : 'r'
0xbc : 'l'
0xbd : 'd'
0xbe : '\0'

Once again, the addresses are arbitrary, the point is that the C-string in the stack is distinct from the string literal in the .rodata section in memory.

What am I trying to do? I am trying to use a char pointer to temporarily hold the base address of the string literal, and use that same char pointer (base address of string literal) to initialize the C-string.

char* cptr = "Hello World";
char cString[] = cptr;

I assume that "Hello World" evaluates to its base address, 0x4. So this code ought to look like this:

char* cptr = 0x4;
char cString[] = 0x4;

I assume that it should be no different from char cString[] = "Hello World"; since "Hello World" evaluates to its base address, and that is what is stored in the char pointer!

However, gcc gives me an error:

error: invalid initializer
char cString[] = cptr;
                 ^
  1. How come you can't use a char pointer as a tempoorary placeholder to store the base address of a string literal?
  2. How does this code work? Are my assumptions correct?
  3. Does using a string literal in the code return the base address to the "array" where the chars are stored in the memory?
like image 638
Galaxy Avatar asked Jun 19 '18 22:06

Galaxy


1 Answers

Your understanding of memory layout is more or less correct. But the problem you are having is one of initialization semantics in C.

The = symbol in a declaration here is NOT the assignment operator. Instead, it is syntax that specifies the initializer for a variable being instantiated. In the general case, T x = y; is not the same as T x; x = y;.

There is a language rule that a character array can be initialized from a string literal. (The string literal is not "evaluated to its base address" in this context). There is not a language rule that an array can be initialized from a pointer to the elements intended to be copied into the array.

Why are the rules like this? "Historical reasons".

like image 157
M.M Avatar answered Sep 23 '22 07:09

M.M