Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Program stuck, pipe file descriptor open when shouldn't?

I am creating a small shell that can read commands. When I run my program and type:"cat file.txt > file2.txt" it creates the file and then it gets stuck at the line: if(execvp(structVariables->argv[0], argv) < 0). (waiting for input/output??). If I end the program with ctrl + d I can see in my folder that the file is created but nothing has been written in it. (dupPipe is used to handle more commands, not yet used because of problem described above)

if((pid = fork()) < 0)
{
        perror("fork error");
}
else if(pid > 0)        // Parent
{
        if(waitpid(pid,NULL,0) < 0)
        {
                perror("waitpid error");
        }
}
else                    // Child
{    
        int flags = 0;

        if(structVariables->outfile != NULL)
        {
                flags = 1;      // Write
                redirect(structVariables->outfile, flags, STDOUT_FILENO);
        }
        if(structVariables->infile != NULL)
        {
                flags = 2;      // Read
                redirect(structVariables->infile, flags, STDIN_FILENO);
        }

        if(execvp(structVariables->argv[0], argv) < 0)
        {
                perror("execvp error");
                exit(EXIT_FAILURE);
        }
}

The two functions I use in my program looks like this: dupPipe and redirect

int dupPipe(int pip[2], int end, int destinfd)
{
    if(end == READ_END)
    {
       dup2(pip[0], destinfd);
       close(pip[0]);
    }
    else if(end == WRITE_END)
    {
       dup2(pip[1], destinfd);
       close(pip[1]);
    }

    return destinfd;
}

int redirect(char *filename, int flags, int destinfd)
{
        int newfd;

        if(flags == 1)
        {
                if(access(filename, F_OK) != -1)        // If file already exists
                {
                        errno = EEXIST;
                        printf("Error: %s\n", strerror(errno));
                        return -1;
                }

                newfd = open(filename, O_WRONLY | O_CREAT | O_EXCL, S_IRUSR | S_IWUSR | S_IRGRP | S_IWGRP | S_IROTH | S_IWOTH);
                if(newfd == -1)
                {
                        perror("Open for write failed");
                        return -1;
                }
        }
        else if(flags == 2)
        {
                newfd = open(filename, O_RDONLY);
                if(newfd == -1)
                {
                        perror("Open for read failed");
                        return -1;
                }
        }
        else
                return -1;

        if(dup2(newfd, destinfd) == -1)
        {
                perror("dup2 failed");
                close(newfd);
                return -1;
        }
        if(newfd != destinfd)
        {
                close(newfd);
        }

        return destinfd;
}
like image 375
Fjodor Avatar asked Jul 20 '15 14:07

Fjodor


2 Answers

It appears that you're trying to code a shell to run commands read from input (if this is not the case; please edit your question, as it is not clear).

I am not sure why you think pipes are used in a command like cat file.txt > file2.txt, but in any case, they are not. Let's see what happens under the hood when you type cat file.txt > file2.txt in a shell like bash:

  1. A child process is created where cat(1) will run.
  2. The child process opens file2.txt for writing (more on this later).
  3. If open(2) is successful, the child process duplicates the newly opened file descriptor onto stdout (so stdout will effectively point to the same file table entry as file2.txt).
  4. cat(1) is executed by calling one of the seven exec() functions. The argument file.txt is passed to cat(1), so cat(1) will open file.txt and read everything, copying its contents to stdout (which is redirected to file2.txt).
  5. cat(1) finishes execution and terminates, which causes any open file descriptors to be closed and flushed. By the time cat(1) terminates, file2.txt is a copy of file.txt.
  6. Meanwhile, the parent shell process waits for the child to terminate before printing the next prompt and waiting for more commands.

As you can see, pipes are not used in I/O redirection. Pipes are an interprocess communication mechanism used to feed the output of a process into the input of another process. You only have one process running here (cat), so why would you even need pipes?

This means that you should be calling redirect() with STDOUT_FILENO as destinfd (instead of a pipe channel) for output redirection. Similarly, input redirection should call redirect() with STDIN_FILENO. These constants are defined in unistd.h so make sure to include that header.

You also probably want to exit on the child if exec() failed, or else you'll be running 2 copies of the shell process.

Last but not least, you shouldn't make input or output redirection exclusive. It might be the case that the user wants both input and output redirection. So instead of else if when doing I/O redirection I would just use 2 independent ifs.

With that in mind, the main code you posted should look something like:

if((pid = fork()) < 0)
{
        perror("fork error");
}
else if(pid > 0)        // Parent
{
        if(waitpid(pid,NULL,0) < 0)
        {
                perror("waitpid error");
        }
}
else                    // Child
{    
        int flags = 0;

        if(structVariables->outfile != NULL)
        {
                flags = 1;      // Write
                // We need STDOUT_FILENO here
                redirect(structVariables->outfile, flags, STDOUT_FILENO);
        }
        if(structVariables->infile != NULL)
        {
                flags = 2;      // Read
                // Similarly, we need STDIN_FILENO here
                redirect(structVariables->infile, flags, STDIN_FILENO);
        }

        // This line changed; see updated answer below
        if(execvp(structVariables->argv[0], structVariables->argv) < 0)
        {
                perror("execvp error");
                // Terminate
                exit(EXIT_FAILURE);
        }
}

As mentioned in another answer, your redirect() function is prone to race conditions because there is a window of time between the file existence check and the actual file creation where another process could create the file (this is known as a TOCTTOU error: Time Of Check To Time Of Use). You should use O_CREAT | O_EXCL to atomically test for existence and create the file.

Another problem is that you always close newfd. What if newfd and destinfd happen to be the same, for some reason? Then you'll be mistakenly closing the file, because dup2(2) is essentially a no-op if you pass in two equal file descriptors. Even if you think this will never happen, it is always good practice to check first if the duplicated fd is different from the original fd before closing the original.

Here's the code with these issues addressed:

int redirect(char *filename, int flags, int destinfd)
{
        int newfd;

        if(flags == 1)
        {
                newfd = open(filename, O_WRONLY | O_CREAT | O_EXCL, 0666);
                if(newfd == -1)
                {
                        perror("Open for write failed");
                        return -1;
                }
        }
        else if(flags == 2)
        {
                newfd = open(filename, O_RDONLY);
                if(newfd == -1)
                {
                        perror("Open for read failed");
                        return -1;
                }
        }
        else
                return -1;

        if(dup2(newfd, destinfd) == -1)
        {
                perror("dup2 failed");
                close(newfd);
                return -1;
        }

        if (newfd != destinfd)
            close(newfd);

        return destinfd;
}

Consider replacing 0666 in open(2) above with S_IRUSR | S_IWUSR | S_IRGRP | S_IWGRP | S_IROTH | S_IWOTH (make sure to include sys/stat.h and fcntl.h). You might want to use a #define to make it cleaner, but I still think it's better and more descriptive if you do it like that rather than hardcoding some magic number (this is subjective though).

I won't comment on dupPipe() since it's not needed / used in this question. I/O redirection is all you need. If you want to extend the discussion to pipes, feel free to edit the question or create another one.

UPDATE

Ok, now that I had a look at the full source code, I have a couple more remarks.

The reason cat(1) is hanging is because of this:

if (execvp(structVariables->argv[0], argv) < 0)

The second parameter to execvp(2) should be structVariables->argv, not argv, because argv is the argument array of the shell program, which is (usually) empty. Passing an empty argument list to cat(1) makes it read from stdin rather than from a file, so that's why it seems to hang - it's waiting for you to feed input. So, go ahead and replace that line with:

if (execvp(structVariables->argv[0], structVariables->argv) < 0)

This solves one of your issues: things like cat < file.txt > file2.txt will now work (I tested it).

About pipe redirection

So now we need to work on pipe redirection. Pipe redirection happens every time we see | on the command line. Let's work through an example to understand what happens under the hood when we type ls | grep "file.txt" | sort. It is important to understand these steps so that you can build an accurate mental model of how the system works; without such a vision, you will not really understand the implementation:

  1. The shell (typically) splits the commands by the pipe symbol first. This is also what your code does. This means that after parsing, the shell has gathered enough information and the command line was split into 3 entities (the ls command, the grep command and the sort command).
  2. The shell forks and calls one of the seven exec() functions on the child to run ls. Now, remember that piping means that the output of a program is the input of the next, so before exec()ing, the shell has to create a pipe. The child process that is about to run ls(1) calls dup2(2) before exec() to duplicate the pipe's write channel onto stdout. Similarly, the parent process calls dup2(2) to duplicate the pipe's read channel onto stdin. It is very important that you understand this step: because the parent duplicate the pipe's read end to stdin, then whatever we do next (e.g. fork again to execute more commands) will always read input from the pipe. So, at this point, we have ls(1) writing to stdout, which is redirected to a pipe that is read by the shell's parent process.

  3. The shell will now execute grep(1). Again, it forks a new process to execute grep(1). Remember that file descriptors are inherited across a fork, and that the parent shell's process has stdin tied to the read end of the pipe connected to ls(1), so the new child process that is about to execute grep(1) will "automagically" read from that pipe! But wait, there's more! The shell knows that there is yet another process in the pipeline (the sort command), so before executing grep (and before forking), the shell creates another pipe to connect the output of grep(1) to the input of sort(1). Then, it repeats the same steps: on the child process, the pipe's write channel is duplicated onto stdout. In the parent, the pipe's read channel is duplicated onto stdin. Again, it is important to really understand what is happening here: the process that is about to execute grep(1) already had its input being read from the pipe connected to ls(1), and now it has the output connected to the pipe that will feed sort(1). So grep(1) is essentially reading from a pipe and writing to a pipe. OTOH, the parent shell process duplicated the last pipe's read channel to stdin, effectively "giving up" from reading the output of ls(1) (because grep(1) will handle it anyway), but instead updating the input stream to read the results from grep(1).

  4. Finally, the shell sees that sort(1) is the last command, so it just forks + execs sort(1). The results are written to stdout, because we never changed stdout in the shell process, but input is read from the pipe that connects grep(1) to sort(1) because of our actions in step 3.

So how is this implemented?

Simple: as long as there is more than one command left to process, we create a pipe and fork. On the child, we close the pipe's read channel, duplicate the pipe's write channel onto stdout, and call one of the seven exec() functions. On the parent, we close the pipe's write channel, and duplicate the pipe's read channel onto stdin.

When there is only one command left to process, we simply fork + exec, without creating a pipe.

There is only one last detail to clarify: before starting the pipe(2) redirection party, we need to store a reference to the original shell's standard input, since we will (possibly) change it many times throughout the way. If we didn't save it, we might lose the reference to the original stdin file, and then we wouldn't be able to read user input anymore! In the code, I usually do this with fcntl(2) with the F_DUPFD_CLOEXEC (see man 2 fcntl), to ensure that the descriptor is closed when a command is executed in the child process (it is usually bad practice to leave open file descriptors around when they are used).

Also, the shell process needs to wait(2) on the last process in the pipeline. If you think about it, it makes sense: pipes inherently synchronize each command in the pipeline; the set of commands is assumed to be over only when the last command reads EOF from the pipe (that is, we know that we're done only when all of the data flowed throughout the entire pipeline). If the shell didn't wait for the last process, but instead waited for some other process in the middle (or in the beginning) of the pipeline, it would return back to the command prompt too soon and leave other commands still running on the background - not a smart move, as the user expects the shell to finish executing the current job before waiting for more.

So ... this is a lot of information, but it is really important that you understand it. So the revised main code is here:

int saved_stdin = fcntl(STDIN_FILENO, F_DUPFD_CLOEXEC, 0);

if (saved_stdin < 0) {
    perror("Couldn't store stdin reference");
    break;
}

pid_t pid;
int i;
/* As long as there are at least two commands to process... */
for (i = 0; i < n-1; i++) {
    /* We create a pipe to connect this command to the next command */
    int pipefds[2];

    if (pipe(pipefds) < 0) {
        perror("pipe(2) error");
        break;
    }

    /* Prepare execution on child process and make the parent read the
     * results from the pipe
     */
    if ((pid = fork()) < 0) {
        perror("fork(2) error");
        break;
    }

    if (pid > 0) {
        /* Parent needs to close the pipe's write channel to make sure
         * we don't hang. Parent reads from the pipe's read channel.
         */

        if (close(pipefds[1]) < 0) {
            perror("close(2) error");
            break;
        }

        if (dupPipe(pipefds, READ_END, STDIN_FILENO) < 0) {
            perror("dupPipe() error");
            break;
        }
    } else {

        int flags = 0;

        if (structVariables[i].outfile != NULL)
        {
            flags = 1;      // Write
            if (redirect(structVariables[i].outfile, flags, STDOUT_FILENO) < 0) {
                perror("redirect() error");
                exit(EXIT_FAILURE);
            }
        }
        if (structVariables[i].infile != NULL)
        {
            flags = 2;      // Read
            if (redirect(structVariables[i].infile, flags, STDIN_FILENO) < 0) {
                perror("redirect() error");
                exit(EXIT_FAILURE);
            }
        }

        /* Child writes to the pipe (that is read by the parent); the read
         * channel doesn't have to be closed, but we close it for good practice
         */

        if (close(pipefds[0]) < 0) {
            perror("close(2) error");
            break;
        }

        if (dupPipe(pipefds, WRITE_END, STDOUT_FILENO) < 0) {
            perror("dupPipe() error");
            break;
        }

        if (execvp(structVariables[i].argv[0], structVariables[i].argv) < 0) {
            perror("execvp(3) error");
            exit(EXIT_FAILURE);
        }
    }
}

if (i != n-1) {
    /* Some error caused an early loop exit */
    break;
}

/* We don't need a pipe for the last command */
if ((pid = fork()) < 0) {
    perror("fork(2) error on last command");
}

if (pid > 0) {
    /* Parent waits for the last command to execute */
    if (waitpid(pid, NULL, 0) < 0) {
        perror("waitpid(2) error");
    }
} else {
    int flags = 0;
    /* Execute last command. This will read from the last pipe we set up */
    if (structVariables[i].outfile != NULL)
    {
        flags = 1;      // Write
        if (redirect(structVariables[i].outfile, flags, STDOUT_FILENO) < 0) {
            perror("redirect() error");
            exit(EXIT_FAILURE);
        }
    }
    if (structVariables[i].infile != NULL)
    {
        flags = 2;      // Read
        if (redirect(structVariables[i].infile, flags, STDIN_FILENO) < 0) {
            perror("redirect() error");
            exit(EXIT_FAILURE);
        }
    }
    if (execvp(structVariables[i].argv[0], structVariables[i].argv) < 0) {
        perror("execvp(3) error on last command");
        exit(EXIT_FAILURE);
    }
}

/* Finally, we need to restore the original stdin descriptor */
if (dup2(saved_stdin, STDIN_FILENO) < 0) {
    perror("dup2(2) error when attempting to restore stdin");
    exit(EXIT_FAILURE);
}
if (close(saved_stdin) < 0) {
    perror("close(2) failed on saved_stdin");
}

Some final remarks about dupPipe():

  • Both dup2(2) and close(2) may return an error; you should probably check for this and act accordingly (i.e. pass the error up the call stack by returning -1).
  • Again, you shouldn't blindly close the descriptor after you duplicate it, because it may be the case that the source and destination descriptors are the same.
  • You should validate that end is either READ_END or WRITE_END, and return an error if that is not true (instead of returning destinfd no matter what, which may give a false sense of success to the caller code)

Here's how I would improve it:

int dupPipe(int pip[2], int end, int destinfd)
{
    if (end != READ_END && end != WRITE_END)
        return -1;

    if(end == READ_END)
    {
        if (dup2(pip[0], destinfd) < 0)
            return -1;
        if (pip[0] != destinfd && close(pip[0]) < 0)
            return -1;
    }
    else if(end == WRITE_END)
    {
        if (dup2(pip[1], destinfd) < 0)
            return -1;
        if (pip[1] != destinfd && close(pip[1]) < 0)
            return -1;
    }

    return destinfd;
}

Have fun with your shell!

like image 80
Filipe Gonçalves Avatar answered Sep 21 '22 19:09

Filipe Gonçalves


execvp does not return unless there is an error.

Therefore, the originating program will (normally) not execute code beyond the call to execvp()

the normal sequence of code is:

1) fork()
2) if child then call execvp(); 
3) if parent ....
like image 20
user3629249 Avatar answered Sep 20 '22 19:09

user3629249