I am attempting to create a UNIX shell in C. If it were in Java, it would be a piece of cake, but I am not so experienced in C. Arrays in C confuse me a bit. I am not sure how to declare or access certain data structures.
I would like to create a string to read in each line. Easy enough: simply an array of characters. I would initialize it as follows:
char line[256]; //Maximum size of each line is 255 characters
And to access an element of this array, I would do as follows:
line[0] = 'a'; //Sets element 0 to 'a'
fgets( line, sizeof line, stdin ); //Gets a line from stdin and places it in line
How does declaring and using a string in this manner differ from declaring it as a pointer? From my understanding, an array in C decays to a pointer. So, would the following be equivalent?
char *line = (char*) malloc( sizeof(char) * 256 );
line[0] = 'a';
fgets( *line, sizeof(line), stdin );
When do you use the pointer character '*', and when don't you? In the example above, is including the '*' in fgets necessary, or correct?
Now, I would like to create an array of strings, or rather, an array of pointers which point to strings. Would I do so as follows?
char *arr[20]; // Declares an array of strings with 20 elements
And how would I access it?
arr[0] = "hello" // Sets element zero of arr to "hello"
Is this correct?
How would I pass this array to a function?
execvp("ls", arr); // Executes ls with argument vector arr
Is that correct, or would I use the pointer *arr? If so, why?
Now even worse, I would like an array of arrays of strings (for example, if I wanted to hold multiple argument vectors, in order to execute multiple commands in pipe sequence). Would it be declared as follows?
char **vector_arr[20]; // An array of arrays of strings
And how would I access an element of this array?
execvp("ls", vector_arr[0]); // Executes ls with first element of vector_arr as argument vector
I thought that I grasped a decent understanding of what a pointer is, and even how arrays relate to pointers, however I seem to be having trouble relating this to the actual code. I guess that when dealing with pointers, I don't know when to reference *var, var, or &var.
To create an array, define the data type (like int ) and specify the name of the array followed by square brackets []. To insert values to it, use a comma-separated list, inside curly braces: int myNumbers[] = {25, 50, 75, 100}; We have now created a variable that holds an array of four integers.
An array is a variable that can store multiple values. For example, if you want to store 100 integers, you can create an array for it. int data[100];
Array in C can be defined as a method of clubbing multiple entities of similar type into a larger group. These entities or elements can be of int, float, char, or double data type or can be of user-defined data types too like structures.
Single Dimensional Array. The single-dimensional array is one of the most used types of the array in C. It is a linear collection of similar types of data, and the allocated memory for all data blocks in the single-dimensional array remains consecutive.
Let's talk about expressions and types as they relate to arrays in C.
Arrays
When you declare an array like
char line[256];
the expression line
has type "256-element array of char
"; except when this expression is the operand of the sizeof
or unary &
operators, it will be converted ("decay") to an expression of type "pointer to char
", and the value of the expression will be the address of the first element of the array. Given the above declaration, all of the following are true:
Expression Type Decays to Equivalent value
---------- ---- --------- ----------------
line char [256] char * &line[0]
&line char (*)[256] n/a &line[0]
*line char n/a line[0]
line[i] char n/a n/a
&line[0] char * n/a n/a
sizeof line size_t n/a Total number of bytes
in array (256)
Note that the expressions line
, &line
, and &line[0]
all yield the same value (the address of the first element of the array is the same as the address of the array itself), it's just that the types are different. In the expression &line
, the array expression is the operand of the &
operator, so the conversion rule above doesn't apply; instead of a pointer to char
, we get a pointer to a 256-element array of char
. Type matters; if you write something like the following:
char line[256];
char *linep = line;
char (*linearrp)[256] = &line;
printf( "linep + 1 = %p\n", (void *) (linep + 1) );
printf( "linearrp + 1 = %p\n", (void *) (linearrp + 1) );
you'd get different output for each line; linep + 1
would give the address of the next char
following line
, while linearrp + 1
would give the address of the next 256-element array of char
following line
.
The expression line
is not an modifiable lvalue; you cannot assign to it, so something like
char temp[256];
...
line = temp;
would be illegal. No storage is set aside for a variable line
separate from line[0]
through line[256]
; there's nothing to assign to.
Because of this, when you pass an array expression to a function, what the function receives is a pointer value, not an array. In the context of a function parameter declaration, T a[N]
and T a[]
are interpreted as T *a
; all three declare a
as a pointer to T
. The "array-ness" of the parameter has been lost in the course of the call.
All array accesses are done in terms of pointer arithmetic; the expression a[i]
is evaluated as *(a + i)
. The array expression a
is first converted to an expression of pointer type as per the rule above, then we offset i
elements from that address and dereference the result.
Unlike Java, C does not set aside storage for a pointer to the array separate from the array elements themselves: all that's set aside is the following:
+---+
| | line[0]
+---+
| | line[1]
+---+
...
+---+
| | line[255]
+---+
Nor does C allocate memory for arrays from the heap (for whatever definition of heap). If the array is declared auto
(that is, local to a block and without the static
keyword), the memory will be allocated from wherever the implementation gets memory for local variables (what most of us call the stack). If the array is declared at file scope or with the static
keyword, the memory will be allocated from a different memory segment, and it will be allocated at program start and held until the program terminates.
Also unlike Java, C arrays contain no metadata about their length; C assumes you knew how big the array was when you allocated it, so you can track that information yourself.
Pointers
When you declare a pointer like
char *line;
the expression line
has type "pointer to char
" (duh). Enough storage is set aside to store the address of a char
object. Unless you declare it at file scope or with the static
keyword, it won't be initialized and will contain some random bit pattern that may or may not correspond to a valid address. Given the above declaration, all of the following are true:
Expression Type Decays to Equivalent value
---------- ---- --------- ----------------
line char * n/a n/a
&line char ** n/a n/a
*line char n/a line[0]
line[i] char n/a n/a
&line[0] char * n/a n/a
sizeof line size_t n/a Total number of bytes
in a char pointer
(anywhere from 2 to
8 depending on the
platform)
In this case, line
and &line
do give us different values, as well as different types; line
is a simple scalar object, so &line
gives us the address of that object. Again, array accesses are done in terms of pointer arithmetic, so line[i]
works the same whether line is declared as an array or as a pointer.
So when you write
char *line = malloc( sizeof *line * 256 ); // note no cast, sizeof expression
this is the case that works like Java; you have a separate pointer variable that references storage that's allocated from the heap, like so:
+---+
| | line -------+
+---+ |
... |
+---+ |
| | line[0] <---+
+---+
| | line[1]
+---+
...
+---+
| | line[255]
+---+
Unlike Java, C won't automatically reclaim this memory when there are no more references to it. You'll have to explicitly deallocate it when you're finished with it using the free
library function:
free( line );
As for your specific questions:
fgets( *line, sizeof(line), stdin );
When do you use the pointer character '*', and when don't you? In the example above, is including the '*' in fgets necessary, or correct?
It is not correct; fgets
expects the first argument to have type "pointer to char
"; the expression *line
has type char
. This follows from the declaration:
char *line;
Secondly, sizeof(line)
only gives you the size of the pointer, not the size of what the pointer points to; unless you want to read exactly sizeof (char *)
bytes, you'll have to use a different expression to specify the number of characters to read:
fgets( line, 256, stdin );
Now, I would like to create an array of strings, or rather, an array of pointers which point to strings. Would I do so as follows?char *arr[20]; // Declares an array of strings with 20 elements
C doesn't have a separate "string" datatype the way C++ or Java do; in C, a string is simply a sequence of character values terminated by a 0. They are stored as arrays of char
. Note that all you've declared above is a 20-element array of pointers to char
; those pointers can point to things that aren't strings.
If all of your strings are going to have the same maximum length, you can declare a 2D array of char
like so:
char arr[NUM_STRINGS][MAX_STRING_LENGTH + 1]; // +1 for 0 terminator
and then you would assign each string as
strcpy( arr[i], "some string" );
strcpy( arr[j], some_other_variable );
strncpy( arr[k], MAX_STRING_LENGTH, another_string_variable );
although beware of strncpy
; it won't automatically append the 0 terminator to the destination string if the source string was longer than the destination. You'll have to make sure the terminator is present before trying to use it with the rest of the string library.
If you want to allocate space for each string separately, you can declare the array of pointers, then allocate each pointer:
char *arr[NUM_STRINGS];
...
arr[i] = malloc( strlen("some string") + 1 );
strcpy( arr[i], "some string" );
...
arr[j] = strdup( "some string" ); // not available in all implementations, calls
// malloc under the hood
...
arr[k] = "some string"; // arr[k] contains the address of the *string literal*
// "some string"; note that you may not modify the contents
// of a string literal (the behavior is undefined), so
// arr[k] should not be used as an argument to any function
// that tries to modify the input parameter.
Note that each element of arr
is a pointer value; whether these pointers point to strings (0-terminated sequences of char
) or not is up to you.
Now even worse, I would like an array of arrays of strings (for example, if I wanted to hold multiple argument vectors, in order to execute multiple commands in pipe sequence). Would it be declared as follows?char **vector_arr[20]; // An array of arrays of strings
What you've declared is an array of pointers to pointers to char; note that this is perfectly valid if you don't know how many pointers to char
you need to store in each element. However, if you know the maximum number of arguments per element, it may be clearer to write
char *vector_arr[20][N];
Otherwise, you'd have to allocate each array of char *
dynamically:
char **vector_arr[20] = { NULL }; // initialize all the pointers to NULL
for ( i = 0; i < 20; i++ )
{
// the type of the expression vector_arr is 20-element array of char **, so
// the type of the expression vector_arr[i] is char **, so
// the type of the expression *vector_arr[i] is char *, so
// the type of the expression vector[i][j] is char *, so
// the type of the expression *vector_arr[i][j] is char
vector_arr[i] = malloc( sizeof *vector_arr[i] * num_args_for_this_element );
if ( vector_arr[i] )
{
for ( j = 0; j < num_args_for_this_element )
{
vector_arr[i][j] = malloc( sizeof *vector_arr[i][j] * (size_of_this_element + 1) );
// assign the argument
strcpy( vector_arr[i][j], argument_for_this_element );
}
}
}
So, each element of vector_arr
is an N-element array of pointers to M-element arrays of char
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With