Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Program output changes when piped

Tags:

c

linux

unix

I have a simple C program to time process startup (I'd rather not post the full code as it's an active school assignment). My main function looks like this:

int main(void) {
  int i;

  for (i = 0; i < 5; i++) {
    printf("%lf\n", sample_time());
  }

  exit(0);
}

sample_time() is a function that times how long it takes to fork a new process and returns the result in seconds as a double. The part of sample_time() that forks:

double sample_time() {
  // timing stuff

  if (!fork()) exit(0); // immediately close new process

  // timing stuff

  return a_sample_time;
}

As expected, running the program, times, in the terminal outputs 5 numbers like so:

$ ./times
0.000085
0.000075
0.000079
0.000071
0.000078

However, trying to redirect this into a file (or anywhere else) in a Unix terminal produces unexpected results.

For example, ./times > times.out creates a file with fifteen numbers. Additionally, ./times | wc -l outputs 15, confirming the earlier result. Running ./times | cat, I again see fifteen numbers, more than five of which are distinct.

Does anyone know what on earth can cause something like this? I'm out of ideas.

./times != ./times | cat. Wat.

like image 591
John Zeringue Avatar asked Jan 24 '14 21:01

John Zeringue


1 Answers

Prerequisite knowledge

  • Fact 1 - When stdout is connected to a TTY it is line buffered. When it's connected to a file or a pipeline it is full buffered. This means it's only flushed every 8KB, say, rather than every line.

  • Fact 2 - Forked processes have duplicate copies of in-memory data. This includes stdio's output buffers if the data hasn't been flushed yet.

  • Fact 3 - Calling exit() causes stdio's output buffers to be flushed before the program exits.

Case 1: Output to terminal

When your program prints to the terminal its output is line buffered. Each printf() call that ends with \n immediately prints. This means each line is printed and the in-memory output buffer is emptied before fork() runs.

Result: 5 lines of output.

Case 2: Output to pipeline or file

When libc sees that stdout isn't connected to a TTY it switches to a more efficient full buffering strategy. This causes output to be buffered until 4KB worth have accumulated. That means the output from the printf()s is saved in memory, and calls to write() are deferred.

if (!fork()) exit(0);

After forking, the child process has a copy of the buffered output. The exit() call then causes that buffer to be flushed. This doesn't affect the parent process, though. Its output is still buffered.

Then when the second line of output is printed, it has two lines buffered. The next child process forks, exits, and prints those two lines. The parent retains its two lines of output, and so on.

Result: The child processes print 0, 1, 2, 3, and 4 lines of output. The main program prints 5 when it finally exits and flushes its output. 0 + 1 + 2 + 3 + 4 + 5 = 15. 15 lines of output instead of 5!

Solutions

  1. Call _Exit() instead of exit(). The function _Exit() is like exit(), but does not call any functions registered with atexit(). This would be my preferred solution.

  2. Explicitly set stdout to be line buffered: setvbuf(stdout, NULL, _IOLBF, 0);

  3. Call fflush(stdout) after each printf.

like image 84
John Kugelman Avatar answered Nov 16 '22 00:11

John Kugelman