Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why piping to the same file doesn't work on some platforms?

In cygwin, the following code works fine

$ cat junk
bat
bat
bat

$ cat junk | sort -k1,1 |tr 'b' 'z' > junk

$ cat junk
zat
zat
zat

But in the linux shell(GNU/Linux), it seems that overwriting doesn't work

[41] othershell: cat junk
cat
cat
cat
[42] othershell: cat junk |sort -k1,1 |tr 'c' 'z'
zat
zat
zat
[43] othershell: cat junk |sort -k1,1 |tr 'c' 'z' > junk
[44] othershell: cat junk

Both environments run BASH.

I am asking this because sometimes after I do text manipulation, because of this caveat, I am forced to make the tmp file. But I know in Perl, you can give "i" flag to overwrite the original file after some operations/manipulations. I just want to ask if there is any foolproof method in unix pipeline to overwrite the file that I am not aware of.

like image 272
Alby Avatar asked May 14 '12 15:05

Alby


People also ask

Can you use pipe in shell script?

Pipe may be the most useful tool in your shell scripting toolbox. It is one of the most used, but also, one of the most misunderstood. As a result, it is often overused or misused. This should help you use a pipe correctly and hopefully make your shell scripts much faster and more efficient.

Is pipe a file in Linux?

A FIFO, also known as a named pipe, is a special file similar to a pipe but with a name on the filesystem. Multiple processes can access this special file for reading and writing like any ordinary file. Thus, the name works only as a reference point for processes that need to use a name in the filesystem.

Does SED overwrite file?

By default sed does not overwrite the original file; it writes to stdout (hence the result can be redirected using the shell operator > as you showed).

How does pipe work in Linux?

Pipe is used to combine two or more commands, and in this, the output of one command acts as input to another command, and this command's output may act as input to the next command and so on. It can also be visualized as a temporary connection between two or more commands/ programs/ processes.


3 Answers

Four main points here:

  1. "Useless use of cat." Don't do that.
  2. You're not actually sorting anything with sort. Don't do that.
  3. Your pipeline doesn't say what you think it does. Don't do that.
  4. You're trying to over-write a file in-place while reading from it. Don't do that.

One of the reasons you are getting inconsistent behavior is that you are piping to a process that has redirection, rather than redirecting the output of the pipeline as a whole. The difference is subtle, but important.

What you want is to create a compound command with Command Grouping, so that you can redirect the input and output of the whole pipeline. In your case, this should work properly:

{ sort -k1,1 | tr 'c' 'z'; } < junk > sorted_junk

Please note that without anything to sort, you might as well skip the sort command too. Then your command can be run without the need for command grouping:

tr 'c' 'z' < junk > sorted_junk

Keep redirections and pipelines as simple as possible. It makes debugging your scripts much easier.

However, if you still want to abuse the pipeline for some reason, you could use the sponge utility from the moreutils package. The man page says:

sponge reads standard input and writes it out to the specified file. Unlike a shell redirect, sponge soaks up all its input before opening the output file. This allows constricting pipelines that read from and write to the same file.

So, your original command line can be re-written like this:

cat junk | sort -k1,1 | tr 'c' 'z' | sponge junk

and since junk will not be overwritten until sponge receives EOF from the pipeline, you will get the results you were expecting.

like image 167
Todd A. Jacobs Avatar answered Nov 15 '22 06:11

Todd A. Jacobs


In general this can be expected to break. The processes in a pipeline are all started up in parallel, so the > junk at the end of the line will usually truncate your input file before the process at the head of the pipelining has finished (or even started) reading from it.

Even if bash under Cygwin let's you get away with this you shouldn't rely on it. The general solution is to redirect to a temporary file and then rename it when the pipeline is complete.

like image 32
larsks Avatar answered Nov 15 '22 06:11

larsks


You want to edit that file, you can just use the editor.

ex junk << EOF
%!(sort -k1,1 |tr 'b' 'z')
x
EOF
like image 22
pizza Avatar answered Nov 15 '22 04:11

pizza