Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

When calling the exec*() family of functions, do the char* elements of argv all have to be unique?

Tags:

c

posix

exec

I'm trying to write a small utility that relays its argument list to an exec'd process, except some of the incoming arguments are repeated when building the new process's argument list.

Below is a very simplified version of what I'm looking to do, which simply duplicates each argument once:

#include <stdlib.h>
#include <unistd.h>

#define PROG "ls"

int main(int argc, char* argv[] ) {

    int progArgCount = (argc-1)*2;
    char** execArgv = malloc(sizeof(char*)*(progArgCount+2)); // +2 for PROG and final 0
    execArgv[0] = PROG;
    for (int i = 0; i<progArgCount; ++i)
        execArgv[i+1] = argv[i/2+1];
    execArgv[progArgCount+1] = 0;

    execvp(PROG, execArgv );

} // end main()

Notice how the elements of execArgv are not unique. Specifically, the two elements in each duplication are the same, meaning they point to the same address in memory.

Does Standard C say anything about this usage? Is it incorrect, or undefined behavior? If not, is it still inadvisable, since the exec'd program might depend on the uniqueness of its argv elements? Please correct me if I'm wrong, but isn't it possible for programs to modify their argv elements directly, since they're non-const? Wouldn't that create a risk of the exec'd program blithely modifying argv[1] (say) and then accessing argv[2], falsely assuming that the two elements point to independent strings? I'm pretty sure I did this myself a few years ago when I was beginning to learn about C/C++, and I don't think it occurred to me at that time that the argv elements might not be unique.

I know that exec'ing involves "replacement of the process image", but I'm not sure what that entails exactly. I can imagine that it might involve deepcopying the given argv argument (execArgv in my example above) to fresh allocations of memory, which would probably uniquify the thing, but I don't know enough about the internals of the exec functions to say. And it would be wasteful, at least if the original data structure could instead be preserved across the "replacement" operation, so that's a reason for me to doubt that it happens. And perhaps different platforms/implementations behave differently in this respect? Can answerers please speak to this?


I tried to find documentation on this question, but I was only able to find the following, from http://pubs.opengroup.org/onlinepubs/9699919799/functions/exec.html:

The arguments specified by a program with one of the exec functions shall be passed on to the new process image in the corresponding main() arguments.

The above doesn't clarify if it is a uniquified deepcopy of the arguments that is passed on to the new process, or not.

The argument argv is an array of character pointers to null-terminated strings. The application shall ensure that the last member of this array is a null pointer. These strings shall constitute the argument list available to the new process image. The value in argv[0] should point to a filename that is associated with the process being started by one of the exec functions.

Ditto for the above.

The argv[] and envp[] arrays of pointers and the strings to which those arrays point shall not be modified by a call to one of the exec functions, except as a consequence of replacing the process image.

I honestly don't know how to interpret the above. "Replacing the process image" is the entire point of the exec functions! If it's going to modify the array or the strings, then that would constitute a "consequence of replacing the process image", in one sense or another. This almost implies that the exec functions will modify argv. This excerpt simply reinforces my confusion.

The statement about argv[] and envp[] being constants is included to make explicit to future writers of language bindings that these objects are completely constant. Due to a limitation of the ISO C standard, it is not possible to state that idea in standard C. Specifying two levels of const-qualification for the argv[] and envp[] parameters for the exec functions may seem to be the natural choice, given that these functions do not modify either the array of pointers or the characters to which the function points, but this would disallow existing correct code. Instead, only the array of pointers is noted as constant. The table of assignment compatibility for dst= src derived from the ISO C standard summarizes the compatibility:

It's not clear what "The statement about argv[] and envp[] being constants" refers to; my leading theory is that it refers to the const-qualification of the parameters in the prototypes given at the top of the documentation page. But since those qualifiers only mark the pointers, and not the char data, it hardly makes explicit "that these objects are completely constant". Secondly, I don't know why the paragraph talks about "writers of language bindings"; bindings to what? How is that relevant to a general documentation page on the exec functions? Thirdly, the main thrust of the paragraph just seems to be saying that we are stuck with leaving the actual char content of the strings pointed to by the argv elements as non-const for the sake of backwards compatibility with the established ISO C standard and "existing correct code" that conforms to it. This is confirmed by the table which follows on the documentation page, which I will not quote here. None of this decisively answers my primary questions, although it does state fairly clearly in the middle of the excerpt that the exec functions, in themselves, do not modify the given argv object in any way.


I would greatly appreciate information pertaining to my primary questions as well as commentary on my interpretations and comprehension of the quoted documentation excerpts (particularly, if my interpretations are wrong in any way). Thanks!

like image 310
bgoldst Avatar asked Dec 14 '17 14:12

bgoldst


People also ask

What does the exec () family of system calls do?

The exec family of system calls replaces the program executed by a process. When a process calls exec, all code (text) and data in the process is lost and replaced with the executable of the new program.

What are the differences among the family of exec () functions?

The differences are in how the program is found, how the arguments are specified, and where the environment comes from. The calls with v in the name take an array parameter to specify the argv[] array of the new program.

How many system calls are there in the family of exec ()?

When a process calls the execlp or one of the other 7 exec functions, that process is completely replaced by the new program, and the new program starts executing at its main function.


2 Answers

There are a lot of questions buried in your post, so I'll only address the most important parts of it (IMO):

Does Standard C say anything about this usage? Is it incorrect, or undefined behavior?

If by "standard C" you mean POSIX, then you've already found the specification for exec*. If it doesn't mandate that the arguments need to be distinct, then they don't need to be distinct.

And as pointed out by @SomeProgrammerDude in the comments, one is very likely to get non-distinct strings in the case of string literals, as the compiler is free to deduplicate them (e.g. execl("foo", "bar", "foo")).

is it still inadvisable, since the exec'd program might depend on the uniqueness of its argv elements?

The C standard itself does not mandate distinct strings in argv, so one can't rely on them being distinct.

The above doesn't clarify if it is a uniquified deepcopy of the arguments

We can say for certain that copies must be made somehow, as otherwise there'd be the possibility of modifying string literals (which isn't allowed).

However, the details of how this is achieved seems to be left as an implementation choice. So it's probably best not to rely on any particular behaviour.

like image 103
Oliver Charlesworth Avatar answered Sep 28 '22 15:09

Oliver Charlesworth


Nowhere in the POSIX manual it's mandated that arguments in argv are required to be unique. The arguments are required to be null terminated strings and have a null pointer as the last argument for variadic ones:

The arguments represented by arg0,... are pointers to null-terminated character strings. These strings shall constitute the argument list available to the new process image. The list is terminated by a null pointer. The argument arg0 should point to a filename string that is associated with the process being started by one of the exec functions.

The argument argv is an array of character pointers to null-terminated strings. The application shall ensure that the last member of this array is a null pointer. These strings shall constitute the argument list available to the new process image. The value in argv[0] should point to a filename string that is associated with the process being started by one of the exec functions.

And that's all that POSIX requires. So there's no explicit requirement that arguments have to be unique. So if an implementation requires the arguments to be unique then that conflicts with the standard. Because standard functions can not impose unspecified requirements or have effects not specified in the standard.

"Replacing the process image" is the entire point of the exec functions! If it's going to modify the array or the strings, then that would constitute a "consequence of replacing the process image", in one sense or another. This almost implies that the exec functions will modify argv.

Modifying is allowed only on success; otherwise, "replacing the image" wouldn't occur and thus there are no "consequences". It's essentially to prevent leaving the argv and envp in an unusable state on failed exec calls in the original process.

exec can't do a shallow copy because there's no way for it to know about the storage duration of the arguments it's given. So even the following should be fine:

char *p = "argument";
execvp("cmd", (char *[]){"cmd", p, p + 2, (char*)0});
like image 39
P.P Avatar answered Sep 28 '22 15:09

P.P