This sounds simple on its face but is actually somewhat more complex. I would like to use a unix utility to delete consecutive duplicates, leaving the original. But, I would also like to preserve other duplicates that do not occur immediately after the original. For example, if we have the lines:
O B
O B
C D
T V
O B
I want the output to be:
O B
C D
T V
O B
Although the first and last lines are the same, they are not consecutive and therefore I want to keep them as unique entries.
Remove duplicate lines with uniq If you don't need to preserve the order of the lines in the file, using the sort and uniq commands will do what you need in a very straightforward way. The sort command sorts the lines in alphanumeric order. The uniq command ensures that sequential identical lines are reduced to one.
To remove duplicate lines from a sorted file and make it unique, we use the uniq command in the Linux system. The uniq command work as a kind of filter program that reports out the duplicate lines in a file. It filters adjacent matching lines from the input and gives a unique output.
The uniq command in Linux is used to display identical lines in a text file. This command can be helpful if you want to remove duplicate words or strings from a text file. Since the uniq command matches adjacent lines for finding redundant copies, it only works with sorted text files.
Uniq command is helpful to remove or detect duplicate entries in a file.
You can do:
cat file1 | uniq > file2
or more succinctly:
uniq file1 file2
assuming file1
contains
O B
O B
C D
T V
O B
For more details, see man uniq. In particular, note that the uniq
command accepts two arguments with the following syntax: uniq [OPTION]... [INPUT [OUTPUT]]
.
Finally if you'd want to remove all duplicates (and sort the file along the way), you could do:
sort -u file1 > file2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With