Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Linux command or script counting duplicated lines in a text file?

Send it through sort (to put adjacent items together) then uniq -c to give counts, i.e.:

sort filename | uniq -c

and to get that list in sorted order (by frequency) you can

sort filename | uniq -c | sort -nr

Almost the same as borribles' but if you add the d param to uniq it only shows duplicates.

sort filename | uniq -cd | sort -nr

uniq -c file

and in case the file is not sorted already:

sort file | uniq -c


cat <filename> | sort | uniq -c

Try this

cat myfile.txt| sort| uniq

Can you live with an alphabetical, ordered list:

echo "red apple
> green apple
> green apple
> orange
> orange
> orange
> " | sort -u 

?

green apple
orange
red apple

or

sort -u FILE

-u stands for unique, and uniqueness is only reached via sorting.

A solution which preserves the order:

echo "red apple
green apple
green apple
orange
orange
orange
" | { old=""; while read line ; do   if [[ $line != $old ]]; then  echo $line;   old=$line; fi ; done }
red apple
green apple
orange

and, with a file

cat file | { 
old=""
while read line
do
  if [[ $line != $old ]]
  then
    echo $line
    old=$line
  fi
done }

The last two only remove duplicates, which follow immediately - which fits to your example.

echo "red apple
green apple
lila banana
green apple
" ...

Will print two apples, split by a banana.