Is there a way to delete duplicate lines in a file in Unix?
I can do it with sort -u
and uniq
commands, but I want to use sed
or awk
.
Is that possible?
The uniq command is used to remove duplicate lines from a text file in Linux. By default, this command discards all but the first of adjacent repeated lines, so that no output lines are repeated. Optionally, it can instead only print duplicate lines.
Overview. When we talk about removing duplicate lines in the Linux command line, many of us may come up with the uniq command and the sort command with the -u option. Indeed, both commands can remove duplicate lines from input, for example, a text file.
Remove duplicate lines with uniq If you don't need to preserve the order of the lines in the file, using the sort and uniq commands will do what you need in a very straightforward way. The sort command sorts the lines in alphanumeric order. The uniq command ensures that sequential identical lines are reduced to one.
awk '!seen[$0]++' file.txt
seen
is an associative array that AWK will pass every line of the file to. If a line isn't in the array then seen[$0]
will evaluate to false. The !
is the logical NOT operator and will invert the false to true. AWK will print the lines where the expression evaluates to true.
The ++
increments seen
so that seen[$0] == 1
after the first time a line is found and then seen[$0] == 2
, and so on.
AWK evaluates everything but 0
and ""
(empty string) to true. If a duplicate line is placed in seen
then !seen[$0]
will evaluate to false and the line will not be written to the output.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With