Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to sort,uniq and display line that appear more than X times

Tags:

bash

sorting

uniq

I have a file like this:

80.13.178.2
80.13.178.2
80.13.178.2
80.13.178.2
80.13.178.1
80.13.178.3
80.13.178.3
80.13.178.3
80.13.178.4
80.13.178.4
80.13.178.7

I need to display unique entries for repeated line (similar to uniq -d) but only entries that occur more than just twice (twice being an example so flexibility to define the lower limit.)

Output for this example should be like this when looking for entries with three or more occurrences:

80.13.178.2
80.13.178.3
like image 723
Andrew Kennen Avatar asked Nov 22 '13 14:11

Andrew Kennen


People also ask

What happens when you sort a file in UNIQ?

When we sort the file, it groups the duplicate lines, and uniq treats them as duplicates. We’ll use sort on the file, pipe the sorted output into uniq, and then pipe the final output into less. A sorted list of lines appears in less.

How do you sort duplicate lines in UNIQ?

The reason you see duplicate lines is because, for uniq to consider a line a duplicate, it must be adjacent to its duplicate, which is where sort comes in. When we sort the file, it groups the duplicate lines, and uniq treats them as duplicates. We’ll use sort on the file, pipe the sorted output into uniq, and then pipe the final output into less.

How do I find unique lines in UNIQ?

Find unique lines The -u option would cause uniq to print only unique lines. Quoting from man uniq: Obs: Remember to sort before uniq -u because uniq operates on adjacent lines. So what uniq -u actually does is to print lines that don't have identical neighbor lines, but that doesn't mean they are really unique.

How do I tell Uniq which fields to ignore?

We’ll use the -f (fields) option to tell uniq which fields to ignore. We get the same results we did when we told uniq to skip three characters at the start of each line. By default, uniq is case-sensitive. If the same letter appears capped and in lowercase, uniq considers the lines to be different.


Video Answer


2 Answers

Feed the output from uniq -cd to awk

sort test.file | uniq -cd | awk -v limit=2 '$1 > limit{print $2}'
like image 55
iruvar Avatar answered Oct 21 '22 08:10

iruvar


With pure awk:

awk '{a[$0]++}END{for(i in a){if(a[i] > 2){print i}}}' a.txt 

It iterates over the file and counts the occurances of every IP. At the end of the file it outputs every IP which occurs more than 2 times.

like image 32
hek2mgl Avatar answered Oct 21 '22 09:10

hek2mgl