Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Maintaining a FIFO readable across different executions

I've never used a named pipe before and recently realized that is just what I need.

I'm running a program using gnu parallel which could produce tons (GB's to 1TB, hard to know right now) of output formatted for a data base on mySQL.

I figured out that I can open two terminals: Terminal 1 gets something like:

find . -type f -name "*.h" | parallel --jobs 12 'cprogram {}' > /home/pipe

Where pipe is a fifo made with mkfifo.

On a second terminal, I run a command similar to this:

mysql DataBaseName -e "LOAD DATA LOCAL INFILE '/home/pipe' INTO TABLE tableName";

It works...

But this is janky...If I understand correctly, there's an EOF generated when the first process ends causing the pipe to close.

Ideally I want to run the first process in a loop with varying parameters. Each iteration could take a long time and I need to make sanity checks so I don't loose a week to find out I've got bugs or faulty logic.

I'd like to know how to use a FIFO for this kind of procedure in a standard way.

like image 938
wbg Avatar asked Oct 20 '22 10:10

wbg


1 Answers

If I understand correctly, there's an EOF generated when the first process ends causing the pipe to close.

Sort of. There's a little bit more to it than that - it is technically incorrect to say that the pipe closes as soon as the first process ends.

Instead, pipes and FIFOs return EOF when there is no more data left in the pipe and it is not opened for writing by any process.

Usually, this is solved by having the reader process open the FIFO both for reading and for writing, even though it will never write - for example, a server that accepts local clients by reading from a FIFO could open the FIFO for reading and writing so that when there are no active clients the server doesn't have to deal with the special case of EOF. This is the "standard" way to deal with it, as outlined in Advanced Programming in the UNIX Environment in the chapter about IPC mechanisms.

In your case though, this is really not possible, because you have no permanent process that keeps running (that is, you don't have the equivalent of a server process). You basically need some sort of "persistent writer", i.e., a process that maintains the pipe opened for writing during the different iterations.

One solution I can think of is to cat standard input to the FIFO in the background. This ensures that cat opens the FIFO for writing, so there is always an active writer, but by keeping it in the background, you don't actually feed it any input and it never writes to the FIFO. Just be aware that the job will be stopped (but not terminated) by the shell as soon as cat attempts to read from stdin (processes running in a background process group are usually sent SIGTTIN and stopped when they attempt to read from stdin, because they don't have a controlling terminal until they are brought to the foreground). Anyway, as long as you don't feed it any input, you're good - the process is in a stopped state, but the FIFO is still opened for writing nonetheless. You'll never see an EOF on the pipe as long as the background job is not terminated.

So, in short, you:

  1. Create the FIFO: mkfifo /home/pipe
  2. Start a background job that opens the FIFO for writing: cat >/home/pipe &
  3. Run your programs however you want, with how many iterations you want. Ignore the shell message about the background job being stopped. You can just leave it like that, since the pipe is still opened for writing even though the job is stopped.
  4. When you're done, kill the background cat by either bringing it to the foreground and sending it SIGINT (usually, Ctrl+C) or with kill PID.

Note that by doing this the reader process (mysql in this case) will never know when the input is over. It will always block for more input, unless you kill the background cat before killing mysql.

like image 163
Filipe Gonçalves Avatar answered Oct 22 '22 01:10

Filipe Gonçalves