ISO C requires that hosted implementations call a function named main
. If the program receives arguments, they are received as an array of char*
pointers, the second argument in main
's definition int main(int argc, char* argv[])
.
ISO C also requires that the strings pointed to by the argv
array be modifiable.
But can the elements of argv
alias one another? In other words, can there exist i
, j
such that
0 >= i && i < argc
0 >= j && j < argc
i != j
0 < strlen(argv[i])
strlen(argv[i]) <= strlen(argv[j])
argv[i]
aliases argv[j]
at program start-up? If so, a write through argv[i][0]
would also be seen through the aliasing string argv[j]
.
The relevant clauses of the ISO C Standard are below, but do not allow me to conclusively answer the titular question.
§ 5.1.2.2.1 Program startup
The function called at program startup is named
main
. The implementation declares no prototype for this function. It shall be defined with a return type ofint
and with no parameters:int main(void) { /* ... */ }
or with two parameters (referred to here as
argc
andargv
, though any names may be used, as they are local to the function in which they are declared):int main(int argc, char *argv[]) { /* ... */ }
or equivalent; 10) or in some other implementation-defined manner.
If they are declared, the parameters to the
main
function shall obey the following constraints:
- The value of
argc
shall be nonnegative.argv[argc]
shall be a null pointer.- If the value of
argc
is greater than zero, the array membersargv[0]
throughargv[argc-1]
inclusive shall contain pointers to strings, which are given implementation-defined values by the host environment prior to program startup. The intent is to supply to the program information determined prior to program startup from elsewhere in the hosted environment. If the host environment is not capable of supplying strings with letters in both uppercase and lowercase, the implementation shall ensure that the strings are received in lowercase.- If the value of
argc
is greater than zero, the string pointed to byargv[0]
represents the program name;argv[0][0]
shall be the null character if the program name is not available from the host environment. If the value ofargc
is greater than one, the strings pointed to byargv[1]
throughargv[argc-1]
represent the program parameters.- The parameters
argc
andargv
and the strings pointed to by theargv
array shall be modifiable by the program, and retain their last-stored values between program startup and program termination.
By my reading, the answer to the titular question is "yes", since nowhere is it explicitly forbidden and nowhere does the standard urge or require the use of char* restrict*
-qualified argv
, but the answer might turn on the interpretation of "and retain their last-stored values between program startup and program termination.".
The practical import of this question is that if the answer to it is indeed "yes", a portable program that wishes to modify the strings in argv
must first perform (the equivalent of) POSIX strdup()
on them for safety.
By my reading, the answer to the titular is "yes", since nowhere is it explicitly forbidden and nowhere does the standard urge or require the use of restrict-qualified argv, but the answer might turn on the interpretation of "and retain their last-stored values between program startup and program termination.".
I concur that the standard does not explicitly forbid elements of the argument vector from being aliases of each other. I don't think the modifiability and value-retention provisions contradict that position, but they do suggest to me that the committee did not consider the possibility of aliasing.
The practical import of this question is that if the answer to it is indeed "yes", a portable program that wishes to modify the strings in argv must first perform (the equivalent of) POSIX strdup() on them for safety.
Indeed, that's exactly why I think the committee didn't even consider the possibility. If they had done then surely they would have at least included a footnote to that same effect, or else explicitly specified that the argument strings are all distinct.
I'm inclined to think that this detail escaped the committee's attention because in practice, implementations indeed do provide distinct strings, and because it is rare, moreover, for programs to modify their argument strings (though modifying argv
itself is somewhat more common). If the committee agreed to issue an official interpretation in this area, then I would not be surprised for them to come down against the possibility of aliasing.
Until and unless such an interpretation is issued, however, you are right that strict conformance does not permit you to rely a priori on argv
elements not being aliased.
The way it works on common *nix platforms (including Linux and Mac OS, presumably FreeBSD too) is that argv
is an array of pointers into a single memory area containing the argument strings one after another (separated only by the null terminator). Using execl()
does not change this--even if the caller passes the same pointer multiple times, the source string is copied multiple times, with no special behavior for identical (i.e. aliased) pointers (an uncommon case with no great benefit to optimize).
However, C does not require this implementation. The truly paranoid may want to copy every string before modifying it, perhaps skipping the copies if memory is limited and a loop over argv
shows that none of the pointers actually alias (at least among those the program intends to modify). This seems overly paranoid unless you are developing flight software or the like.
As a data point, I have compiled and run the following programs on several systems. (Disclaimer: these programs are intended to provide a data point, but as we'll see, they do not end up answering the question as stated.)
p1.c
:
#include <stdio.h>
#include <unistd.h>
int main()
{
char test[] = "test";
execl("./p2", "p2", test, test, NULL);
}
p2.c
:
#include <stdio.h>
int main(int argc, char **argv)
{
int i;
for(i = 1; i < argc; i++) printf("%s ", argv[i]); printf("\n");
argv[1][0] = 'b';
for(i = 1; i < argc; i++) printf("%s ", argv[i]); printf("\n");
}
Every place I've tried it (under MacOS and several flavors of Unix and Linux) it has printed
test test
best test
Since the second line was never "best best
", this proves that, on the tested systems, by the time the second program is run, the strings are no longer aliased.
Of course, this test does not prove that strings in argv
can never be aliased, under any circumstances, under any system out there. I think all it proves is that, unsurprisingly, each of the tested operating systems recopies the argument list at least once between the time p1
calls execl
and the time that p2
is actually invoked. In other words, the argument vector constructed by the invoking program is not used directly in the called program, and in the process of copying it, it is (again not surprisingly) "normalized", meaning that the effects of any aliasing are lost.
(I say this is not surprising because if you think about the way the exec
family of system calls actually work, and the way process memory is laid out under Unix-like systems, there's no way that the invoking program's argument list could be used directly; it has to be copied, at least once, into the address space of the new, exec'ed process. Furthermore, any obvious and straightforward method of copying the argument list is always and automatically going to "normalize" it in this way; the kernel would have to do significant, extra, totally unnecessary work in order to detect and preserve any aliasing.)
Just in case it matters, I modified the first program in this way:
#include <stdio.h>
#include <unistd.h>
int main()
{
char test[] = "test";
char *argv[] = {"p2", test, test, NULL};
execv("./p2", argv);
}
The results were unchanged.
With all of this said, I agree that this issue does seem like an oversight or buglet in the standards. I'm not aware of any clause guaranteeing that the strings pointed to by argv
are distinct, meaning that a paranoidly-written program probably can't depend on such a guarantee, no matter how likely it is that (as this answer demonstrates) any reasonable implementation is likely to do it that way.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With