In cygwin, the following code works fine
$ cat junk
bat
bat
bat
$ cat junk | sort -k1,1 |tr 'b' 'z' > junk
$ cat junk
zat
zat
zat
But in the linux shell(GNU/Linux), it seems that overwriting doesn't work
[41] othershell: cat junk
cat
cat
cat
[42] othershell: cat junk |sort -k1,1 |tr 'c' 'z'
zat
zat
zat
[43] othershell: cat junk |sort -k1,1 |tr 'c' 'z' > junk
[44] othershell: cat junk
Both environments run BASH.
I am asking this because sometimes after I do text manipulation, because of this caveat, I am forced to make the tmp file. But I know in Perl, you can give "i" flag to overwrite the original file after some operations/manipulations. I just want to ask if there is any foolproof method in unix pipeline to overwrite the file that I am not aware of.
Pipe may be the most useful tool in your shell scripting toolbox. It is one of the most used, but also, one of the most misunderstood. As a result, it is often overused or misused. This should help you use a pipe correctly and hopefully make your shell scripts much faster and more efficient.
A FIFO, also known as a named pipe, is a special file similar to a pipe but with a name on the filesystem. Multiple processes can access this special file for reading and writing like any ordinary file. Thus, the name works only as a reference point for processes that need to use a name in the filesystem.
By default sed does not overwrite the original file; it writes to stdout (hence the result can be redirected using the shell operator > as you showed).
Pipe is used to combine two or more commands, and in this, the output of one command acts as input to another command, and this command's output may act as input to the next command and so on. It can also be visualized as a temporary connection between two or more commands/ programs/ processes.
Four main points here:
One of the reasons you are getting inconsistent behavior is that you are piping to a process that has redirection, rather than redirecting the output of the pipeline as a whole. The difference is subtle, but important.
What you want is to create a compound command with Command Grouping, so that you can redirect the input and output of the whole pipeline. In your case, this should work properly:
{ sort -k1,1 | tr 'c' 'z'; } < junk > sorted_junk
Please note that without anything to sort, you might as well skip the sort command too. Then your command can be run without the need for command grouping:
tr 'c' 'z' < junk > sorted_junk
Keep redirections and pipelines as simple as possible. It makes debugging your scripts much easier.
However, if you still want to abuse the pipeline for some reason, you could use the sponge utility from the moreutils package. The man page says:
sponge reads standard input and writes it out to the specified file. Unlike a shell redirect, sponge soaks up all its input before opening the output file. This allows constricting pipelines that read from and write to the same file.
So, your original command line can be re-written like this:
cat junk | sort -k1,1 | tr 'c' 'z' | sponge junk
and since junk will not be overwritten until sponge receives EOF from the pipeline, you will get the results you were expecting.
In general this can be expected to break. The processes in a pipeline are all started up in parallel, so the > junk
at the end of the line will usually truncate your input file before the process at the head of the pipelining has finished (or even started) reading from it.
Even if bash under Cygwin let's you get away with this you shouldn't rely on it. The general solution is to redirect to a temporary file and then rename it when the pipeline is complete.
You want to edit that file, you can just use the editor.
ex junk << EOF
%!(sort -k1,1 |tr 'b' 'z')
x
EOF
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With