Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove duplicate lines and overwrite file in same command

Tags:

bash

awk

I'm trying to remove duplicate lines from a file and update the file. For some reason I have to write it to a new file and replace it. Is this the only way?

awk '!seen[$0]++' .gitignore > .gitignore

awk '!seen[$0]++' .gitignore > .gitignore_new && mv .gitignore_new .gitignore
like image 556
ThomasReggi Avatar asked Jun 11 '16 20:06

ThomasReggi


2 Answers

Redirecting to the same output file as input file like:

awk '!seen[$0]++' .gitignore > .gitignore

will end with an empty file. This is because using the > operator, the shell will open and truncate the file before the command get's executed. Meaning you'll lose all your data.

With newer versions of GNU awk you can use the -i inplace option to edit the file in place:

awk -i inplace '!seen[$0]++' .gitignore

If you don't have a recent version of GNU awk, you'll need to create a temporary file:

awk '!seen[$0]++' .gitignore > .gitignore.tmp
mv .gitignore.tmp .gitignore

Another alternative is to use the sponge program from moreutils:

awk '!seen[$0]++' .gitignore | sponge .gitignore

sponge will soak all stdinput and open the output file after that. This effectively keeps the input file intact before writing to it.

like image 91
hek2mgl Avatar answered Nov 11 '22 10:11

hek2mgl


Thomas, I believe the problem is that you are reading from it and writing to it on the same command. This is why you must put to a temporary file first.

The > does overwrite, so you are using the correct redirect operator

  • Redirect output from a command to a file on disk. Note: if the file already exist, it will be erased and overwritten without warning, so be careful.

Example: ps -ax >processes.txt Use the ps command to get a list of processes running on the system, and store the output in a file named processes.txt

like image 31
Chewy Avatar answered Nov 11 '22 09:11

Chewy