I have x lines like this:
Unable to find latest released revision of 'CONTRIB_046578'.
And I need to extract the word between the revision of '
and '
in this example the word CONTRIB_046578
and if possible count the number of occurrences of that word using grep
, sed
or any other command?
The cleanest solution is with grep -Po "(?<=')[^']+(?=')"
$ cat file
Unable to find latest released revision of 'CONTRIB_046578'
Unable to find latest released revision of 'foo'
Unable to find latest released revision of 'bar'
Unable to find latest released revision of 'CONTRIB_046578'
# Print occurences
$ grep -Po "(?<=')[^']+(?=')" file
CONTRIB_046578
foo
bar
CONTRIB_046578
# Count occurences
$ grep -Pc "(?<=')[^']+(?=')" file
4
# Count unique occurrences
$ grep -Po "(?<=')[^']+(?=')" file | sort | uniq -c
2 CONTRIB_046578
1 bar
1 foo
All you need is a very simple awk script to count the occurrences of what's between the quotes:
awk -F\' '{c[$2]++} END{for (w in c) print w,c[w]}' file
Using @anubhava's test input file:
$ cat file
Unable to find latest released revision of 'CONTRIB_046572'
Unable to find latest released revision of 'CONTRIB_046578'
Unable to find latest released revision of 'CONTRIB_046579'
Unable to find latest released revision of 'CONTRIB_046570'
Unable to find latest released revision of 'CONTRIB_046579'
Unable to find latest released revision of 'CONTRIB_046572'
Unable to find latest released revision of 'CONTRIB_046579'
$
$ awk -F\' '{c[$2]++} END{for (w in c) print w,c[w]}' file
CONTRIB_046578 1
CONTRIB_046579 3
CONTRIB_046570 1
CONTRIB_046572 2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With