Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to delete duplicate lines in a text file in unix bash? [duplicate]

Tags:

bash

I just have a file.txt with multiple lines, I would like to remove duplicate lines without sorting the file. what command can i use in unix bash ?

sample of file.txt

orangejuice;orange;juice_apple
pineapplejuice;pineapple;juice_pineapple
orangejuice;orange;juice_apple

sample of output:

orangejuice;orange;juice_apple
pineapplejuice;pineapple;juice_pineapple
like image 626
t28292 Avatar asked Aug 11 '13 09:08

t28292


People also ask

How do I remove duplicate lines in Unix?

Remove duplicate lines with uniq If you don't need to preserve the order of the lines in the file, using the sort and uniq commands will do what you need in a very straightforward way. The sort command sorts the lines in alphanumeric order. The uniq command ensures that sequential identical lines are reduced to one.

How do I remove duplicates from a text file in bash?

The uniq command is used to remove duplicate lines from a text file in Linux. By default, this command discards all but the first of adjacent repeated lines, so that no output lines are repeated. Optionally, it can instead only print duplicate lines.

How do I remove duplicate lines in Linux?

To remove duplicate lines from a sorted file and make it unique, we use the uniq command in the Linux system. The uniq command work as a kind of filter program that reports out the duplicate lines in a file. It filters adjacent matching lines from the input and gives a unique output.


2 Answers

One way using awk:

awk '!a[$0]++' file.txt
like image 71
Steve Avatar answered Oct 15 '22 23:10

Steve


You can use Perl for this:

perl -ne 'print unless $seen{$_}++' file.txt

The -n switch makes Perl process the file line by line. Each line ($_) is stored as a key in a hash named "seen", but since ++ happens after returning the value, the line is printed the first time it is met.

like image 28
choroba Avatar answered Oct 15 '22 23:10

choroba