Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to force a running program to flush the contents of its I/O buffers to disk with external means?

Tags:

c

linux

io

gcc

flush

I have a long-running C program which opens a file in the beginning, writes out "interesting" stuff during execution, and closes the file just before it finishes. The code, compiled with gcc -o test test.c (gcc version 5.3.1.) looks like as follows:

//contents of test.c
#include<stdio.h>

FILE * filept;

int main() {
    filept = fopen("test.txt","w");
    unsigned long i;
    for (i = 0; i < 1152921504606846976; ++i) {
        if (i == 0) {//This case is interesting!
            fprintf(filept, "Hello world\n");
        }
    }
    fclose(filept);
    return 0;
}

The problem is that since this is a scientific computation (think of searching for primes, or whatever is your favourite hard-to-crack stuff) it could really run for a very long time. Since I determined that I am not patient enough, I would like to abort the current computation, but I would like to do this in an intelligent way by somehow forcing the program by external means to flush out all the data that is currently in the OS buffer/disk cache, wherever.

Here is what I have tried (for this bogus program above, and of course not for the real deal which is currently still running):

  1. pressing ctrl+C; or
  2. sending kill -6 <PID> (and also kill -3 <PID>) -- as suggested by @BartekBanachewicz,

but after either of these approaches the file test.txt created in the very beginning of the program remains empty. This means, that the contents of fprintf() were left in some intermediate buffer during the computation, waiting for some OS/hardware/software flush signal, but since no such a signal was obtained, the contents disappeared. This also means, that the comment made by @EJP

Your question is based on a fallacy. 'Stuff that is in the OS buffer/disk cache' won't be lost.

does not seem to apply here. Experience shows, that stuff indeed get lost.

I am using Ubuntu 16.04 and I am willing to attach a debugger to this process if it is possible, and if it is safe to retrieve the data this way. Since I never done such a thing before, I would appreciate if someone would provide me a detailed answer how to get the contents flushed into the disk safely and surely. Or I am open to other methods as well. There is no room for error here, as I am not going to rerun the program again.

Note: Sure I could have opened and closed a file inside the if branch, but that is extremely inefficient once you have many things to be written. Recompiling the program is not possible, as it is still in the middle of some computation.

Note2: the original question was asked the same question in a slightly more abstract way related to C++, and was tagged as such (that is why people in the comments suggesting std::flush(), which wouldn't help even if this was a C++ question). Well, I guess I made a major edit then.


Somewhat related: Will data written via write() be flushed to disk if a process is killed?

like image 922
Matsmath Avatar asked May 18 '16 08:05

Matsmath


1 Answers

Can I just add some clarity? Obviously months have passed, and I imagine your program isn't running any more ... but there's some confusion here about buffering which still isn't clear.

As soon as you use the stdio library and FILE *, you will by default have a fairly small (implementation dependent, but typically some KB) buffer inside your program which is accumulating what you write, and flushing it to the OS when it's full, (or on file close). When you kill your process, it is this buffer that gets lost.

If the data has been flushed to the OS, then it is kept in a unix file buffer until the OS decides to persist it to disk (usually fairly soon), or someone runs the sync command. If you kill the power on your computer, then this buffer gets lost as well. You probably don't care about this scenario, because you probably aren't planning to yank the power! But this is what @EJP was talking about (re Stuff that is in the OS buffer/disk cache' won't be lost): your problem is the stdio cache, not the OS.

In an ideal world, you'd write your app so it fflushed (or std::flush()) at key points. In your example, you'd say:

    if (i == 0) {//This case is interesting!
        fprintf(filept, "Hello world\n");
        fflush(filept);
    }

which would cause the stdio buffer to flush to the OS. I imagine your real writer is more complex, and in that situation I would try to make the fflush happen "often but not too often". Too rare, and you lose data when you kill the process, too often and you lose the performance benefits of buffering if you are writing a lot.

In your described situation, where the program is already running and can't be stopped and rewritten, then your only hope, as you say, is to stop it in a debugger. The details of what you need to do depend on the implementation of the std lib, but you can usually look inside the FILE *filept object and start following pointers, messy though. @ivan_pozdeev's comment about executing std::flush() or fflush() within the debugger is helpful.

like image 135
SusanW Avatar answered Nov 01 '22 09:11

SusanW