Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Creating arrays in C

Tags:

arrays

c

pointers

I am attempting to create a UNIX shell in C. If it were in Java, it would be a piece of cake, but I am not so experienced in C. Arrays in C confuse me a bit. I am not sure how to declare or access certain data structures.

I would like to create a string to read in each line. Easy enough: simply an array of characters. I would initialize it as follows:

char line[256]; //Maximum size of each line is 255 characters

And to access an element of this array, I would do as follows:

line[0] = 'a'; //Sets element 0 to 'a'
fgets( line, sizeof line, stdin ); //Gets a line from stdin and places it in line

How does declaring and using a string in this manner differ from declaring it as a pointer? From my understanding, an array in C decays to a pointer. So, would the following be equivalent?

char *line = (char*) malloc( sizeof(char) * 256 );
line[0] = 'a';
fgets( *line, sizeof(line), stdin );

When do you use the pointer character '*', and when don't you? In the example above, is including the '*' in fgets necessary, or correct?

Now, I would like to create an array of strings, or rather, an array of pointers which point to strings. Would I do so as follows?

char *arr[20]; // Declares an array of strings with 20 elements

And how would I access it?

arr[0] = "hello" // Sets element zero of arr to "hello"

Is this correct?

How would I pass this array to a function?

execvp("ls", arr); // Executes ls with argument vector arr

Is that correct, or would I use the pointer *arr? If so, why?

Now even worse, I would like an array of arrays of strings (for example, if I wanted to hold multiple argument vectors, in order to execute multiple commands in pipe sequence). Would it be declared as follows?

char **vector_arr[20]; // An array of arrays of strings

And how would I access an element of this array?

execvp("ls", vector_arr[0]); // Executes ls with first element of vector_arr as argument vector

I thought that I grasped a decent understanding of what a pointer is, and even how arrays relate to pointers, however I seem to be having trouble relating this to the actual code. I guess that when dealing with pointers, I don't know when to reference *var, var, or &var.

like image 789
Johndt Avatar asked Feb 14 '14 02:02

Johndt


People also ask

How do you create an array in C?

To create an array, define the data type (like int ) and specify the name of the array followed by square brackets []. To insert values to it, use a comma-separated list, inside curly braces: int myNumbers[] = {25, 50, 75, 100}; We have now created a variable that holds an array of four integers.

What is array give example in C?

An array is a variable that can store multiple values. For example, if you want to store 100 integers, you can create an array for it. int data[100];

What is array in C syntax?

Array in C can be defined as a method of clubbing multiple entities of similar type into a larger group. These entities or elements can be of int, float, char, or double data type or can be of user-defined data types too like structures.

Which type of array can be created in C?

Single Dimensional Array. The single-dimensional array is one of the most used types of the array in C. It is a linear collection of similar types of data, and the allocated memory for all data blocks in the single-dimensional array remains consecutive.


1 Answers

Let's talk about expressions and types as they relate to arrays in C.

Arrays

When you declare an array like

char line[256];

the expression line has type "256-element array of char"; except when this expression is the operand of the sizeof or unary & operators, it will be converted ("decay") to an expression of type "pointer to char", and the value of the expression will be the address of the first element of the array. Given the above declaration, all of the following are true:

 Expression             Type            Decays to            Equivalent value
 ----------             ----            ---------            ----------------
       line             char [256]      char *               &line[0]
      &line             char (*)[256]   n/a                  &line[0]
      *line             char            n/a                  line[0]
    line[i]             char            n/a                  n/a
   &line[0]             char *          n/a                  n/a
sizeof line             size_t          n/a                  Total number of bytes 
                                                               in array (256)

Note that the expressions line, &line, and &line[0] all yield the same value (the address of the first element of the array is the same as the address of the array itself), it's just that the types are different. In the expression &line, the array expression is the operand of the & operator, so the conversion rule above doesn't apply; instead of a pointer to char, we get a pointer to a 256-element array of char. Type matters; if you write something like the following:

char line[256];
char *linep = line;
char (*linearrp)[256] = &line;

printf( "linep    + 1 = %p\n", (void *) (linep + 1) );
printf( "linearrp + 1 = %p\n", (void *) (linearrp + 1) );

you'd get different output for each line; linep + 1 would give the address of the next char following line, while linearrp + 1 would give the address of the next 256-element array of char following line.

The expression line is not an modifiable lvalue; you cannot assign to it, so something like

char temp[256];
...
line = temp;

would be illegal. No storage is set aside for a variable line separate from line[0] through line[256]; there's nothing to assign to.

Because of this, when you pass an array expression to a function, what the function receives is a pointer value, not an array. In the context of a function parameter declaration, T a[N] and T a[] are interpreted as T *a; all three declare a as a pointer to T. The "array-ness" of the parameter has been lost in the course of the call.

All array accesses are done in terms of pointer arithmetic; the expression a[i] is evaluated as *(a + i). The array expression a is first converted to an expression of pointer type as per the rule above, then we offset i elements from that address and dereference the result.

Unlike Java, C does not set aside storage for a pointer to the array separate from the array elements themselves: all that's set aside is the following:

+---+
|   | line[0]
+---+
|   | line[1]
+---+
 ...
+---+
|   | line[255]
+---+

Nor does C allocate memory for arrays from the heap (for whatever definition of heap). If the array is declared auto (that is, local to a block and without the static keyword), the memory will be allocated from wherever the implementation gets memory for local variables (what most of us call the stack). If the array is declared at file scope or with the static keyword, the memory will be allocated from a different memory segment, and it will be allocated at program start and held until the program terminates.

Also unlike Java, C arrays contain no metadata about their length; C assumes you knew how big the array was when you allocated it, so you can track that information yourself.

Pointers

When you declare a pointer like

char *line;

the expression line has type "pointer to char" (duh). Enough storage is set aside to store the address of a char object. Unless you declare it at file scope or with the static keyword, it won't be initialized and will contain some random bit pattern that may or may not correspond to a valid address. Given the above declaration, all of the following are true:

 Expression             Type            Decays to            Equivalent value
 ----------             ----            ---------            ----------------
       line             char *          n/a                  n/a
      &line             char **         n/a                  n/a
      *line             char            n/a                  line[0]
    line[i]             char            n/a                  n/a
   &line[0]             char *          n/a                  n/a
sizeof line             size_t          n/a                  Total number of bytes
                                                               in a char pointer
                                                               (anywhere from 2 to
                                                               8 depending on the
                                                               platform)

In this case, line and &line do give us different values, as well as different types; line is a simple scalar object, so &line gives us the address of that object. Again, array accesses are done in terms of pointer arithmetic, so line[i] works the same whether line is declared as an array or as a pointer.

So when you write

char *line = malloc( sizeof *line * 256 ); // note no cast, sizeof expression

this is the case that works like Java; you have a separate pointer variable that references storage that's allocated from the heap, like so:

+---+ 
|   | line -------+
+---+             |
 ...              |
+---+             |
|   | line[0] <---+
+---+
|   | line[1]
+---+
 ...
+---+
|   | line[255]
+---+

Unlike Java, C won't automatically reclaim this memory when there are no more references to it. You'll have to explicitly deallocate it when you're finished with it using the free library function:

free( line );

As for your specific questions:

fgets( *line, sizeof(line), stdin );

When do you use the pointer character '*', and when don't you? In the example above, is including the '*' in fgets necessary, or correct?

It is not correct; fgets expects the first argument to have type "pointer to char"; the expression *line has type char. This follows from the declaration:

char *line; 

Secondly, sizeof(line) only gives you the size of the pointer, not the size of what the pointer points to; unless you want to read exactly sizeof (char *) bytes, you'll have to use a different expression to specify the number of characters to read:

fgets( line, 256, stdin );
Now, I would like to create an array of strings, or rather, an array of pointers which point to strings. Would I do so as follows?
char *arr[20]; // Declares an array of strings with 20 elements

C doesn't have a separate "string" datatype the way C++ or Java do; in C, a string is simply a sequence of character values terminated by a 0. They are stored as arrays of char. Note that all you've declared above is a 20-element array of pointers to char; those pointers can point to things that aren't strings.

If all of your strings are going to have the same maximum length, you can declare a 2D array of char like so:

char arr[NUM_STRINGS][MAX_STRING_LENGTH + 1]; // +1 for 0 terminator

and then you would assign each string as

strcpy( arr[i], "some string" );
strcpy( arr[j], some_other_variable );
strncpy( arr[k], MAX_STRING_LENGTH, another_string_variable );

although beware of strncpy; it won't automatically append the 0 terminator to the destination string if the source string was longer than the destination. You'll have to make sure the terminator is present before trying to use it with the rest of the string library.

If you want to allocate space for each string separately, you can declare the array of pointers, then allocate each pointer:

char *arr[NUM_STRINGS];
...
arr[i] = malloc( strlen("some string") + 1 );
strcpy( arr[i], "some string" );
...
arr[j] = strdup( "some string" ); // not available in all implementations, calls
                                  // malloc under the hood
...
arr[k] = "some string";  // arr[k] contains the address of the *string literal*
                         // "some string"; note that you may not modify the contents
                         // of a string literal (the behavior is undefined), so 
                         // arr[k] should not be used as an argument to any function
                         // that tries to modify the input parameter.

Note that each element of arr is a pointer value; whether these pointers point to strings (0-terminated sequences of char) or not is up to you.

Now even worse, I would like an array of arrays of strings (for example, if I wanted to hold multiple argument vectors, in order to execute multiple commands in pipe sequence). Would it be declared as follows?
char **vector_arr[20]; // An array of arrays of strings

What you've declared is an array of pointers to pointers to char; note that this is perfectly valid if you don't know how many pointers to char you need to store in each element. However, if you know the maximum number of arguments per element, it may be clearer to write

char *vector_arr[20][N];

Otherwise, you'd have to allocate each array of char * dynamically:

char **vector_arr[20] = { NULL }; // initialize all the pointers to NULL

for ( i = 0; i < 20; i++ )
{
  // the type of the expression vector_arr is 20-element array of char **, so
  // the type of the expression vector_arr[i] is char **, so
  // the type of the expression *vector_arr[i] is char *, so
  // the type of the expression vector[i][j] is char *, so
  // the type of the expression *vector_arr[i][j] is char

  vector_arr[i] = malloc( sizeof *vector_arr[i] * num_args_for_this_element );
  if ( vector_arr[i] )
  {
    for ( j = 0; j < num_args_for_this_element )
    {
      vector_arr[i][j] = malloc( sizeof *vector_arr[i][j] * (size_of_this_element + 1) );
      // assign the argument
      strcpy( vector_arr[i][j], argument_for_this_element );
    }
  }
}

So, each element of vector_arr is an N-element array of pointers to M-element arrays of char.

like image 162
John Bode Avatar answered Oct 04 '22 15:10

John Bode