Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Confusing pointers in C

Tags:

c

pointers

I have more than one doubt so please bear with me. Can someone tell me why this code fails?

#include<stdio.h>
void main(int argc,char **argv) /*assume program called with arguments aaa bbb ccc*/
{
    char **list={"aaa","bbb","ccc"};

    printf("%s",argv[1]);/*prints aaa*/
    printf("%s",list[1]); /*fails*/ 
}

I assumed it had something to do with the pointer to pointer stuff, which i do not understand clearly. So i tried:

#include<stdio.h>
void main()
{
char **list={"aaa","bbb","ccc"};
char *ptr;
ptr=list;
printf("%s",ptr);/*this prints the first string aaa*/
    /* My second question is how do i increment the value
       of ptr so that it points to the second string bbb*/
}

What is the difference between char *list[] and char **list and in what situations are both ideal to be used? One more thing confusing me is argv special? when i pass char **list to another function assuming it would let me access the contents the way i could with argv, it also failed.

I realize similar questions have been asked in the past, but i cant seem to find what i need. if so can someone please post the relevant links.

like image 939
Eby John Avatar asked Jan 21 '10 11:01

Eby John


2 Answers

You should use char *list[]={"aaa","bbb","ccc"}; instead of char **list={"aaa","bbb","ccc"};. You use char* list[] = {...}; to declare the array of pointers, but you use char** to pass a pointer to one or more pointers to a function.

  • T* x[] = array of pointers
  • T** x = pointer to pointer

P.S. Responding to ejohn: There is only one use that I can think of for creating a pointer to a pointer (as an actual declared variable, not as a function parameter or temporary created by the unary & operator): a handle. In short, a handle is a pointer to a pointer, where the handl;e is owned by the user but the pointer it points to can be changed as needed by the OS or a library.

Handles were used extensively throughout the old Mac OS. Since Mac OS was developed without virtual memory technology, the only way to keep the heap from quickly getting fragmented was to use handles in almost all memory allocations. This let the OS move memory as needed to compact the heap and open up larger, contiguous blocks of free memory.

Truth is, this strategy at best just "sucked less". There are a huge list of disadvantages:

  • A common bug was when programmers would dereference the handle to a pointer, and use that pointer for several function calls. If any of those function calls moved memory, there was a chance that the pointer would become invalid, and dereferencing it would corrupt memory and possibly crash the program. This is an insidious bug, since dereferencing the bad pointer would not result in a bus error or segmentation fault, since the memory itself was still existent and accessible; it just was no longer used by the object you were using.
  • For this reason, the compiler had to be extra careful and some Common Subexpression Elimination optimizations couldn't be taken (the common subexpression being the handle dereference to a pointer).
  • So, in order to ensure proper execution, almost all accesses through handles require two indirect accesses, instead of one with a plain old pointer. This can hurt performance.
  • Every API provided by the OS or any library had to specify whether it could possibly "move memory". If you called one of these functions, all your pointers obtained via handles were now invalid. There wasn't a way to have the IDE do this for you or check you, since the moves-memory call and the pointer that became invalid might not even be in the same source file.
  • Performance becomes nondeterministic, because you never know when the OS will pause to compact your memory (which involved a lot of memcpy() work).
  • Multithreading becomes difficult because one thread could move memory while another is executing or blocked, invalidating its pointers. Remember, handles have to be used for almost all memory allocation to keep from fragmenting the heap, so threads are still likely to need access to memory via a handle even if they use none of the Mac OS APIs.
  • There were function calls for locking and unlocking the pointers pointed to by handles, however, too much locking hurts performance and fragments the heap.

There's probably several more that I forgot. Remember, all these disadvantages were still more palatable than using only pointers and quickly fragmenting the heap, especially on the first Macs, which only had 128K of RAM. This also gives some insight into why Apple was perfectly happy to ditch all this and go to BSD then they had the chance, once their entire product line had memory management units.

like image 108
Mike DeSimone Avatar answered Sep 25 '22 12:09

Mike DeSimone


First of all, let's get the nitpicky stuff out of the way. main returns int, not void. Unless your compiler documentation specifically states that it supports void main(), use int main(void) or int main(int argc, char **argv).

Now let's step back a minute and talk about the differences between pointers and arrays. The first thing to remember is that arrays and pointers are completely different things. You may have heard or read somewhere that an array is just a pointer; this is incorrect. Under most circumstances, an array expression will have its type implicitly converted from "N-element array of T" to "pointer to T" (the type decays to a pointer type) and its value set to point to the first thing in the array, the exceptions being when the array expression is an operand of either the sizeof or address-of (&) operators, or when the array expression is a string literal being used to initialize another array.

An array is a block of memory sized to hold N elements of type T; a pointer is a block of memory sized to hold the address of a single value of type T. You cannot assign a new value to an array object; i.e., the following is not allowed:

int a[10], b[10];
a = b;

Note that a string literal (such as "aaa") is also an array expression; the type is N-element array of char (const char in C++), where N is the length of the string plus the terminating 0. String literals have static extent; they are allocated at program startup and exist until the program exits. They are also unwritable (attempting to modify the contents of a string literal results in undefined behavior). For example, the type of the expression "aaa" is 4-element array of char with static extent. Like other array expressions, string literals decay from array types to a pointer types in most circumstances. When you write something like

char *p = "aaa";

the array expression "aaa" decays from char [4] to char *, and its value is the address of the first 'a' of the array; that address is then copied to p.

If the literal is being used to initialize an array of char, however:

char a[] = "aaa";

then the type is not converted; the literal is still treated as an array, and the contents of the array are copied to a (and a is implicitly sized to hold the string contents plus the 0 terminator). The result is roughly equivalent to writing

char a[4];
strcpy(a, "aaa");

When an array expression of type T a[N] is the operand of the sizeof operator, the result is the size of the entire array in bytes: N * sizeof(T). When it's the operand of the address-of (&) operator, the result is a pointer to the entire array, not a pointer to the first element (in practice, these are the same value, but the types are different):

Declaration: T a[N];  

 Expression   Type        "Decays" to  Value
 ----------   ----        -----------  ------
          a   T [N]       T *          address of a[0]
         &a   T (*)[N]                 address of a
   sizeof a   size_t                   number of bytes in a
                                        (N * sizeof(T))
       a[i]   T                        value of a[i]
      &a[i]   T *                      address of a[i]
sizeof a[i]  size_t                    number of bytes in a[i] (sizeof (T))

Note that the array expression a decays to type T *, or pointer to T. This is the same type as the expression &a[0]. Both of these expressions yield the address of the first element in the array. The expression &a is of type T (*)[N], or pointer to N-element array of T, and it yields the address of the array itself, not the first element. Since the address of the array is the same as the address of the first element of the array, a, &a, and &a[0] all yield the same value, but the expressions are not all the same type. This will matter when trying to match up function definitions to function calls. If you want to pass an array as a parameter to a function, like

int a[10];
...
foo(a);

then the corresponding function definition must be

void foo(int *p) { ... }

What foo receives is a pointer to int, not an array of int. Note that you can call it as either foo(a) or foo(&a[0]) (or even foo(&v), where v is a simple int variable, although if foo is expecting an array that will cause problems). Note that in the context of a function parameter declaration, int a[] is the same as int *a, but that's only true in this context. Frankly, I think the int a[] form is responsible for a lot of confused thinking about pointers, arrays, and functions, and its use should be discouraged.

If you want to pass a pointer to an array to a function, such as

int a[10];
foo(&a);

then the corresponding function definition must be

void foo(int (*p)[10]) {...}

and when you want to reference a specific element, you must dereference the pointer before applying the subscript:

for (i = 0; i < 10; i++)
  (*p)[i] = i * i;

Now let's throw a monkey wrench into the works and add a second dimension to the array:

Declaration: T a[M][N];

  Expression   Type        "Decays" to  Value
  ----------   ----        -----------  ------
           a   T [M][N]    T (*)[N]     address of a[0]
          &a   T (*)[M][N]              address of a
    sizeof a   size_t                   number of bytes in a (M * N * sizeof(T))
        a[i]   T [N]       T *          address of a[i][0]
       &a[i]   T (*)[N]                 address of a[i]
 sizeof a[i]   size_t                   number of bytes in a[i] (N * sizeof(T))
     a[i][j]   T                        value of a[i][j]
    &a[i][j]   T *                      address of a[i][j]

Note that in this case, both a and a[i] are array expressions, so their respective array types will decay to pointer types in most circumstances; a will be converted from type "M-element array of N-element array of T" to "pointer to N-element array of T", and a[i] will be converted from "N-element array of T" to "pointer to T". And again, a, &a, a[0], &a[0], and &a[0][0] will all yield the same values (the address of the beginning of the array), but not be all the same types. If you want to pass a 2d array to a function, like:

int a[10][20];
foo(a);

then the corresponding function definition must be

void foo(int (*p)[20]) {...}

Notice that this is identical to passing a pointer to a 1-d array (other than the size of the array in the examples being different). In this case, however, you would apply a subscript to the pointer, like

for (i = 0; i < 10; i++)
  for (j = 0; j < 20; j++)
    p[i][j] = i * j;

You don't have to explicitly dereference p in this case, because the expression p[i] implicitly deferences it (p[i] == *(p + i)).

Now let's look at pointer expressions:

Declaration: T *p;
Expression   Type    Value
----------   ----    ------
         p   T *     address of another object of type T
        *p   T       value of another object of type T
        &p   T **    address of the pointer 
  sizeof p   size_t  number of bytes in pointer (depends on type and platform,
                      anywhere between 4 and 8 on common desktop architectures)
 sizeof *p   size_t  number of bytes in T
 sizeof &p   size_t  number of bytes in pointer to pointer (again, depends
                      on type and platform)

This is all pretty straightforward. A pointer type holds the address of another object of type T; dereferencing the pointer (*p) yields the value at that address, and taking the address of the pointer (&p) yields the location of the pointer object (a pointer to a pointer). Applying sizeof to a pointer value will yield the number of bytes in the pointer, not the number of bytes in what the pointer is pointing to.

Now, assuming you've made it this far and haven't yet died of ennui, let's see how all of that applies to your code.

You're wanting to create an array of pointers to char and initialize it with three string literals, so you would declare it as

char *list[] = {"aaa", "bbb", "ccc"};

The list array is implicitly sized to hold 3 elements of type char *. Even though the string literals "aaa", "bbb", and "ccc" appear in an initializer, they are not being used to initialize an array of char; therefore, they decay from expressions of type char [4] to type char *. Each of these pointer values is copied to the elements of list.

When you pass list to a function, such as

foo(list);

the type of list decays from "4-element array of pointer to char" (char *[4]) to "pointer to pointer to char" (char **), so the receiving function must have a definition of

void foo(char **p) {...}

Since subscripting is defined in terms of pointer arithmetic, you can use the subscript operator on the pointer as though it were an array of char *:

for (i = 0; i < 3; i++)
  printf("%s\n", p[i]);

Incidentally, this is how main receives argv, as a pointer to pointer to char (char **), not as an array of pointer to char. Remember, in terms of a function parameter declaration, a[] is identical to *a, so char *argv[] is identical to char **argv.

Now, because I can't seem to stop typing and get back to work (chasing down deadlocks is not fun), let's explore using pointers and dynamically allocated memory.

If you wanted to allocate your list dynamically at run time (i.e., you won't know how many strings are in your list ahead of time), you would declare list as a pointer to pointer to char, and then call malloc to actually allocate the memory for it:

char **list;
size_t number_of_strings;
...
list = malloc(number_of_strings * sizeof *list);
list[0] = "aaa";
list[1] = "bbb";
list[2] = "ccc";
...

Since these are assignments and not initializations, the literal expressions decay into pointers to char, so we're copying the addresses of "aaa", "bbb", etc., to the entries in list. In this case, list is not an array type; it is simply a pointer to a chunk of memory allocated somewhere else (in this case, from the malloc heap). Again, since array subscripting is defined in terms of pointer arithmetic, you can apply the subscript operator to a pointer value as though it were an array. The type of the expression list[i] is char *. There are no implicit conversions to worry about; if you pass it to a function as

foo(list)

then the function definition would be

void foo(char **list) {...}

and you would subscript list as though it were an array.

pssst...is he done?

Yeah, I think he's done.

like image 35
John Bode Avatar answered Sep 23 '22 12:09

John Bode