Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

fgets() call with redirection get abnormal data stream

I was about to write a shell with C language. Here is the source code below:

#include <unistd.h>
#include <stdio.h>
#include <string.h>
#include <sys/wait.h>
#include <stdlib.h>

int
getcmd(char *buf, int nbuf)
{
  memset(buf, 0, nbuf);
  fgets(buf, nbuf, stdin);
  printf("pid: %d, ppid: %d\n", getpid(), getppid());
  printf("buf: %s", buf);
  if(buf[0] == 0) {// EOF
    printf("end of getcmd\n");
    return -1;
  }
  return 0;
}

int
main(void)
{
  static char buf[100];
  int fd, r, ret;

  // Read and run input commands.
  while((ret = getcmd(buf, sizeof(buf))) >= 0){
    if(fork() == 0)
      exit(0);
    wait(&r);
  }
  exit(0);
}

When I execute the compiled executable with redirection of stdin to a file named t.sh which the content of is "1111\n2222\n" like ./myshell < t.sh, the output is:

pid: 2952, ppid: 2374
buf: 1111
pid: 2952, ppid: 2374
buf: 2222
pid: 2952, ppid: 2374
buf: 2222
pid: 2952, ppid: 2374
buf: end of getcmd

Obviously, function getcmd() get 3 lines(1111, 2222, 2222), while there are only 2 lines in t.sh. And these situation get even worse when putting more lines in t.sh.

And the main process is the only process execute getcmd, which we can tell by the output of pid.

By the way, I find if the line of code wait(&r) is removed, the output can get normal.

like image 674
sun Avatar asked Aug 13 '17 03:08

sun


People also ask

When can fgets fail?

If n is greater than 1, fgets() will only fail if an I/O error occurs or if EOF is reached, and no data is read from the file. The ferror() and feof() functions are used to distinguish between a read error and an EOF. Note that EOF is only reached when an attempt is made to read “past” the last byte of data.

Does fgets wait for input?

However, fgets does not seem to wait for a stdin the first time. I always get output of - , and then it waits for input. Meaning, the first iteration of the loop, it is not waiting for standard input at fgets and just prints out two empty characters separated by - as my printf does.

How does fget work?

The fgets() function reads characters from the current stream position up to and including the first new-line character (\n), up to the end of the stream, or until the number of characters read is equal to n-1, whichever comes first.

Does fgets read the newline?

The fgets function reads characters from the stream stream up to and including a newline character and stores them in the string s , adding a null character to mark the end of the string. You must supply count characters worth of space in s , but the number of characters read is at most count - 1.


1 Answers

The wait ensures that the child process gets time to run before the parent finishes with the file. If I strace the file under Linux, I get

% strace -f ./a.out
[lots of stuff]
wait4(-1, strace: Process 29317 attached
 <unfinished ...>
[pid 29317] lseek(0, -2, SEEK_CUR)      = 0
[pid 29317] exit_group(0)               = ?
[pid 29317] +++ exited with 0 +++
<... wait4 resumed> [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 29317
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=29317, si_uid=1000, si_status=0
    _utime=0, si_stime=0} ---
[lots of stuff]

The child process rewinds the standard input as one of the first operations after the fork, after which it will promptly exit. Specifically it rewinds back as many bytes from the stream as was read into it by fgets into buffer but still unused. libc does that automatically after the fork. I also saw the child process flushing the stdout.

I am not sure what to think about this... but clearly if you want to write a shell, you mustn't interact with the standard streams with <stdio.h> at all. If the lseek didn't occur, then the child process would see up to 4095 bytes of the stdin being skipped! You must always use just read and write from <unistd.h> instead. Alternatively, you might have some luck with adding the following call into the beginning of main before anything is read from stdin:

if (setvbuf(stdin, NULL, _IONBF, 0) != 0) {
    perror("setvbuf:");
   exit(1);
}

This will set the stdin stream to unbuffered mode, so it shouldn't read too much. Nevertheless, the Linux manual page for fgets say:

It is not advisable to mix calls to input functions from the stdio library with low-level calls to read(2) for the file descriptor associated with the input stream; the results will be undefined and very probably not what you want.

BTW, this cannot be reproduced if stdin comes from a pipe instead:

% echo -e '1\n2' | ./a.out  
pid: 498, ppid: 21285
buf: 1
pid: 498, ppid: 21285
buf: 2
pid: 498, ppid: 21285
buf: end of getcmd

But naturally that makes the other problem visible - that the child sees input being skipped.


P.S.

You never check the return value of fgets so you do not know when a read error occurs.

If a read error occurs during the operation, the array contents are indeterminate and a null pointer is returned.

like image 114