Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

count pattern occurrence per line

Tags:

regex

unix

sed

awk

The desired output keeps for each line the first two 'columns' and adds the number of occurrences of 'word' on that same line.

Input:

string1 string2 aaaaaaaaa word aaaaaaaa word  
string3 string4 ccccccccccc word dddaaaaaaacccd word dddddaaaaa word bbbb  
string5 string6 aaaa word bbbbbbaddd word aaaaa word ccccccdddddddddd word cccccc

Desired output:

string1 string2 2  
string3 string4 3  
string5 string6 4

Any suggestions?

like image 315
user1875323 Avatar asked Feb 13 '14 20:02

user1875323


2 Answers

Using awk

awk '{print $1,$2,gsub(/word/,"")}' file
string1 string2 2
string3 string4 3
string5 string6 4

Explanation

  • The gsub() function returns the number of substitutions made.
like image 68
BMW Avatar answered Oct 06 '22 22:10

BMW


I'm ignoring sed, here's how to do it with awk:

awk '{count=0; 
      for(i=3; i <= NF; i++) {if($i=="word") { count++ }}; 
      print $1, $2, count; }' inputfile
like image 34
Barmar Avatar answered Oct 06 '22 23:10

Barmar