Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Bash Script: count unique lines in file

Tags:

bash

People also ask

How do I count lines in bash?

wc. The wc command is used to find the number of lines, characters, words, and bytes of a file. To find the number of lines using wc, we add the -l option. This will give us the total number of lines and the name of the file.

How do I count duplicate lines in Linux?

The uniq command has a convenient -c option to count the number of occurrences in the input file. This is precisely what we're looking for. However, one thing we must keep in mind is that the uniq command with the -c option works only when duplicated lines are adjacent.

How do you count values in Unix?

Wc Command in Linux (Count Number of Lines, Words, and Characters) On Linux and Unix-like operating systems, the wc command allows you to count the number of lines, words, characters, and bytes of each given file or standard input and print the result.


You can use the uniq command to get counts of sorted repeated lines:

sort ips.txt | uniq -c

To get the most frequent results at top (thanks to Peter Jaric):

sort ips.txt | uniq -c | sort -bgr

To count the total number of unique lines (i.e. not considering duplicate lines) we can use uniq or Awk with wc:

sort ips.txt | uniq | wc -l
awk '!seen[$0]++' ips.txt | wc -l

Awk's arrays are associative so it may run a little faster than sorting.

Generating text file:

$  for i in {1..100000}; do echo $RANDOM; done > random.txt
$ time sort random.txt | uniq | wc -l
31175

real    0m1.193s
user    0m0.701s
sys     0m0.388s

$ time awk '!seen[$0]++' random.txt | wc -l
31175

real    0m0.675s
user    0m0.108s
sys     0m0.171s

This is the fastest way to get the count of the repeated lines and have them nicely printed sored by the least frequent to the most frequent:

awk '{!seen[$0]++}END{for (i in seen) print seen[i], i}' ips.txt | sort -n

If you don't care about performance and you want something easier to remember, then simply run:

sort ips.txt | uniq -c | sort -n

PS:

sort -n parse the field as a number, that is correct since we're sorting using the counts.