I just have a file.txt with multiple lines, I would like to remove duplicate lines without sorting the file. what command can i use in unix bash ?
sample of file.txt
orangejuice;orange;juice_apple
pineapplejuice;pineapple;juice_pineapple
orangejuice;orange;juice_apple
sample of output:
orangejuice;orange;juice_apple
pineapplejuice;pineapple;juice_pineapple
Remove duplicate lines with uniq If you don't need to preserve the order of the lines in the file, using the sort and uniq commands will do what you need in a very straightforward way. The sort command sorts the lines in alphanumeric order. The uniq command ensures that sequential identical lines are reduced to one.
The uniq command is used to remove duplicate lines from a text file in Linux. By default, this command discards all but the first of adjacent repeated lines, so that no output lines are repeated. Optionally, it can instead only print duplicate lines.
To remove duplicate lines from a sorted file and make it unique, we use the uniq command in the Linux system. The uniq command work as a kind of filter program that reports out the duplicate lines in a file. It filters adjacent matching lines from the input and gives a unique output.
One way using awk
:
awk '!a[$0]++' file.txt
You can use Perl for this:
perl -ne 'print unless $seen{$_}++' file.txt
The -n
switch makes Perl process the file line by line. Each line ($_
) is stored as a key in a hash named "seen", but since ++
happens after returning the value, the line is printed the first time it is met.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With