Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how array of strings in C looks like in the memory?

Tags:

arrays

c

I'm trying to figure out how 2d char array looks like in the memory. for example:

    char   c[][5]={"xa","ccc","bb","j","a","d"};

    printf("TEST: %u %u %u %u \n\n",c[0],*c[0],c[0]+1,*(c[0]+1));

output:

TEST: 3214246874 120 3214246875 97

c[0]=*(c+0) is the string "xa", and equals to 3214246874, so I guess c[0] is the address to the char array "xa". when I put a * to c[0], I got 120 which is 'x' in ascii.

so I think the first space in c array is an address to the char x. after that I tried the same with c[0]+1, and it printed the next address, and then i put * and i got ,97 which is 'a' in ascii.

so I assumed the array c looks like this:

c[0]                              c[1]
------------------------------------------------------------------
| pointer to x | pointer to a ||| pointer to c | pointer to c | etc ...
----------------------------------------------------------------------

but I searched the web and I didnt find any proof for my assumption.

like image 465
Daniel2708 Avatar asked Jan 09 '17 16:01

Daniel2708


People also ask

How is an array of string represented in the memory?

String arrays or an array of strings can be represented using a special form of two-dimensional arrays. In this representation, we use a two-dimensional array of type characters to represent a string.

How is an array of strings stored in C?

String literals are stored in C as an array of chars, terminted by a null byte. A null byte is a char having a value of exactly zero, noted as '\0'. Do not confuse the null byte, '\0', with the character '0', the integer 0, the double 0.0, or the pointer NULL.

How is an array represented in memory in C?

Arrays are often represented with diagrams that represent their memory use. The diagram below is one typical way to represent the memory used by an array. Each box represents the amount of memory needed to hold one array element. For ints this is usually 4 bytes.

How are strings represented in memory in C?

A string constant in C is represented by a sequence of characters within double quotes. Standard C character escape sequences like \n (newline), \r (carriage return), \a (bell), \0x17 (character with hexadecimal code 0x17), \\ (backslash), and \" (double quote) can all be used inside string constants.


1 Answers

You are conflating two senses of the term "string" as it is used in C.

Most correctly, a C string is a null-terminated array of char. You have declared an array of char arrays, and initialized it with null-terminated char sequences. It is perfectly reasonable to characterize this as an "array of strings".

Arrays are not at all the same thing as pointers, however. The elements of your array are other arrays, each one (in your case) five chars long. This is where the other sense of the term "string" comes in. C arrays are a bit slippery; if you evaluate a (sub-)expression of array type, it evaluates to a pointer to the first array element. In the case of strings, such a pointer has type char *, and so it is common it refer to pointers into strings as strings themselves. That is a colloquialism, however, and you will get yourself into trouble if you do not recognize the difference between the two related meanings.

Breaking down your example code:

    char   c[][5]={"xa","ccc","bb","j","a","d"};

    printf("TEST: %u %u %u %u \n\n",c[0],*c[0],c[0]+1,*(c[0]+1));
  • The expression c[0] designates an array of five char. When evaluated in the context of the function call expression, it becomes a pointer to the first element of the array. This value is of type char *, which is not the correct type for the corresponding printf field descriptor, %u. Undefined behavior results. You could correct this by casting the argument to void * and changing the field descriptor to %p.

  • Given that c[0] evaluates to a pointer to the first char of the first member array, it follows that the expression *c[0] evaluates to the pointed-to char. This value again fails to match the corresponding field descriptor, which should be %c -- you should then expect 'x' to be printed. Alternatively, you could cast the value: (unsigned int)*c[0]. In that case, you would expect the numeric code for 'x' to be printed; that is very likely to be 120. That 120 is in fact the value actually printed is an inconsequential characteristic of the specific manifestation of the undefined behavior of your program.

  • Again given that c[0] evaluates to a pointer to the first char of the first member array, it follows that c[0] + 1 is a pointer addition, resulting in a pointer to the second char in that array. As with c[0], this does not match the format.

  • And presumably it will be clear by this point that *(c[0] + 1) evaluates to the second char (at index 1) in array c[0]. The expression is rigorously equivalent to c[0][1]. This again does not match the format.

so I assumed the array c looks like this [...]

Nope. The array looks like this:

| c[0]         | c[1]         | c[2]         | c[3]         | c[4]         | c[5]         |
  x  a \0 \0 \0  c  c  c \0 \0  b  b \0 \0 \0  j \0 \0 \0 \0  a \0 \0 \0 \0  d \0 \0 \0 \0
like image 128
John Bollinger Avatar answered Sep 18 '22 02:09

John Bollinger