Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to grep number of unique occurrences

Tags:

grep

bash

awk

I understand that grep -c string can be used to count the occurrences of a given string. What I would like to do is count the number of unique occurrences when only part of the string is known or remains constant.

For Example, if I had a file (in this case a log) with several lines containing a constant string and a repeating variable like so:

string=value1
string=value1
string=value1
string=value2
string=value3
string=value2

Than I would like to be able to identify the number of each unique set with an output similar to the following: (ideally with a single grep/awk string)

value1 = 3 occurrences
value2 = 2 occurrences
value3 = 1 occurrences

Does anyone have a solution using grep or awk that might work? Thanks in advance!

like image 685
Simpleton Avatar asked Sep 11 '13 22:09

Simpleton


3 Answers

This worked perfectly... Thanks to everyone for your comments!

grep -oP "wwn=[^,]*" path/to/file | sort | uniq -c

like image 188
Simpleton Avatar answered Sep 26 '22 02:09

Simpleton


In general, if you want to grep and also keep track of results, it is best to use awk since it performs such things in a clear manner with a very simple syntax.

So for your given file I would use:

$ awk -F= '/string=/ {count[$2]++} END {for (i in count) print i, count[i]}' file
value1 3
value2 2
value3 1

What is this doing?

  • -F=
    set the field separator to =, so that we can compute the right and left part of it.
  • /string=/ {count[$2]++}
    when the pattern "string=" is found, check it! This uses an array count[] to keep track on the times the second field has appeared so far.
  • END {for (i in count) print i, count[i]}
    at the end, loop through the results and print them.
like image 32
fedorqui 'SO stop harming' Avatar answered Sep 25 '22 02:09

fedorqui 'SO stop harming'


Here's an awk script:

#!/usr/bin/awk -f

BEGIN {
    file = ARGV[1]
    while ((getline line < file) > 0) {
        for (i = 2; i < ARGC; ++i) {
            p = ARGV[i]
            if (line ~ p) {
                a[p] += !a[p, line]++
            }
        }
    }
    for (i = 2; i < ARGC; ++i) {
        p = ARGV[i]
        printf("%s = %d occurrences\n", p, a[p])
    }
    exit
}

Example:

awk -f script.awk somefile ab sh

Output:

ab = 7 occurrences
sh = 2 occurrences
like image 41
konsolebox Avatar answered Sep 25 '22 02:09

konsolebox