Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does C get the array offset right for an array of strings?

Tags:

arrays

c

string

I'm doing something for class where I want to use a different format string based on certain conditions. I defined it like so:

const char *fmts[] = {"this one is a little long", "this one is short"};

later, I can use

printf(fmts[0]);

or

printf(fmts[1]);

and it works.

Is the compiler doing something for us? My guess is that it would take the longest string and store all of them aligned like that. But I'd like to know from someone who knows. Thanks

like image 793
Derrick Avatar asked Nov 28 '22 18:11

Derrick


1 Answers

It does it the same way as for any other data type. An array of "strings" is actually an array of character pointers, which all have the same size. So, in order to get the right address for the pointer, it multiplies the index by the size of an individual element, then adds that to the base address.

Your array will look like this:

      <same-size>
      +---------+
fmts: | fmts[0] | ------+
      +---------+       |
      | fmts[1] | ------|--------------------------+
      +---------+       |                          |
                        V                          V
                        this one is a little long\0this one is short\0

The characters for the strings themselves are not stored in the array, they exist elsewhere. The way you have it, they're usually stored in read only memory although you can malloc them as well, or even define them as a modifiable character array with something like:

char f0[] = "you can modify me without invoking undefined behaviour";

You can see this in operation with the following code:

#include<stdio.h>
const char *fmts[] = {
    "This one is a little long",
    "Shorter",
    "Urk!"
};
int main (void) {
    printf ("Address of fmts[0] is %p\n", (void*)(&(fmts[0])));
    printf ("Address of fmts[1] is %p\n", (void*)(&(fmts[1])));
    printf ("Address of fmts[2] is %p\n", (void*)(&(fmts[2])));

    printf ("\n");

    printf ("Content of fmts[0] (%p) is %c%c%c...\n",
        (void*)(fmts[0]), *(fmts[0]+0), *(fmts[0]+1), *(fmts[0]+2));
    printf ("Content of fmts[1] (%p) is %c%c%c...\n",
        (void*)(fmts[1]), *(fmts[1]+0), *(fmts[1]+1), *(fmts[1]+2));
    printf ("Content of fmts[2] (%p) is %c%c%c...\n",
        (void*)(fmts[2]), *(fmts[2]+0), *(fmts[2]+1), *(fmts[2]+2));

    return 0;
}

which outputs:

Address of fmts[0] is 0x40200c
Address of fmts[1] is 0x402010
Address of fmts[2] is 0x402014

Content of fmts[0] (0x4020a0) is Thi...
Content of fmts[1] (0x4020ba) is Sho...
Content of fmts[2] (0x4020c2) is Urk...

Here you can see that the actual addresses of the array elements are equidistant - 0x40200c + 4 = 0x402010, 0x402010 + 4 = 0x402014.

However, the values are not, because they refer to differently sized strings. The strings are in a single memory block (in this case - it's not necessary by any means) as shown below, with the * characters indication start and end of individual strings:

         |  +0 +1 +2 +3 +4 +5 +6 +7 +8 +9 +a +b +c +d +e +f +0123456789abcdef
---------+-------------------------------------------------------------------
0x04020a0| *54 68 69 73 20 6f 6e 65 20 69 73 20 61 20 6c 69  This one is a li
0x04020b0|  74 74 6c 65 20 6c 6f 6e 67 00*53 68 6f 72 74 65  ttle long.Shorte
0x04020c0|  72 00*55 72 6b 21 00*                            r.Urk!.
like image 65
paxdiablo Avatar answered Dec 19 '22 08:12

paxdiablo