Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to remove duplicates from a file and write to the same file?

I know my title is not much self-explanatory but let me try to explain it here.

I have a file name test.txt which has some duplicate lines. Now, what I want to do is remove those duplicate lines and at the same time update test.txt with the new content.

test.txt

AAAA
BBBB
AAAA
CCCC

I know I can use sort -u test.txt to remove the duplicates but to update the file with new content how do I redirect it's output to the same file. The below command doesn't work.

sort -u test.txt > test.txt

So, why the above command is not working and whats the correct way?

Also is there any other way like

sort_and_update_file test.txt

which sorts and automatically updates my file without any need of redirection.

like image 864
ronnie Avatar asked Jul 07 '12 13:07

ronnie


People also ask

How do I remove duplicates in a text file?

The uniq command is used to remove duplicate lines from a text file in Linux. By default, this command discards all but the first of adjacent repeated lines, so that no output lines are repeated. Optionally, it can instead only print duplicate lines. For uniq to work, you must first sort the output.

Which of the following commands will sort and remove duplicate lines from a file filenames?

Remove duplicate lines with uniq If you don't need to preserve the order of the lines in the file, using the sort and uniq commands will do what you need in a very straightforward way. The sort command sorts the lines in alphanumeric order. The uniq command ensures that sequential identical lines are reduced to one.


2 Answers

This might work for you:

sort -u -o test.txt test.txt
like image 192
potong Avatar answered Nov 16 '22 01:11

potong


Redirection in the shell will not work as you are trying to read and write from the same file at the same time. Actually the file is opened for writing (> file.txt) before the sort is even executed

@potong's answer works because the sort program itself probably stores all lines in memory, I would not rely on it because it does not explicitly specifies in the manpage that it CAN be the same as the input file (though it will likely work). Unless documented to work "in place" I would not do it (@perreal's answer would work, or you can store intermediate results in shell memory)

like image 23
nhed Avatar answered Nov 16 '22 00:11

nhed