Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

awk count specified char occur , return 0 if not found

Tags:

awk

I have a textfile:

and

b

,

.

apple

banana

and I want to count the occurrences of some specific characters which include semi_colon, in this case there's no semi_colon found

excepted output would be : semi_colon 0

Here's my code:

sed -e 's/;/semi_colon/g' data.txt|awk '{count[$1]++} if(count[$1]==""){count[$1]=0} END{print"semi_colon",count["semi_colon"]}'

which gets the output like :

semi_colon

Wondering how to achieve the expected output:

semi_colon 0

Cheers if anyone can help!

like image 671
Edcwu Avatar asked Dec 30 '25 16:12

Edcwu


2 Answers

With your shown samples, please try following awk code.

awk '
{
  countA+=gsub(/a/,"&")
  countSemiColon+=gsub(/;/,"&")
}
END{
  print "a "countA+0 ORS "semi_colon " countSemiColon+0
}
'  Input_file

Explanation: Simple explanation would be, in main block of awk program using awk's gsub function/method to globally substitute a with itself(just for counting its occurrences sake) and putting its number of substitution occurrences into countA awk variable, where += denotes that its value will keep adding to its previous value itself(to get TOTAL number of a's in all lines).

Then creating variable named countSemiColon which has value of global substitutions of ; in each line and it keep adding its all values from all lines to get all occurrences in whole Input_file. In END block of awk printing the value of variables countA and countSemiColon as per required output.

like image 68
RavinderSingh13 Avatar answered Jan 01 '26 18:01

RavinderSingh13


$ cat tst.awk
{
    for ( i=1; i<=length($0); i++ ) {
        cnt[substr($0,i,1)]++
    }
}
END {
    print "semi_colon", cnt[";"]+0
}

$ awk -f tst.awk file
semi_colon 0

and for multiple chars:

$ cat tst.awk
{
    for ( i=1; i<=length($0); i++ ) {
        cnt[substr($0,i,1)]++
    }
}
END {
    chars = "a;."
    map[";"] = "semi_colon"
    for ( i=1; i<=length(chars); i++ ) {
        char = substr(chars,i,1)
        print (char in map ? map[char] : char), cnt[char]+0
    }
}

$ awk -f tst.awk file
a 5
semi_colon 0
. 1

Regarding your original code sed -e 's/;/semi_colon/g' data.txt|awk... - you never need sed when you're using awk and doing that would break your script if you also wanted it to count any of the letters that are present in the string semi_colon (e.g. _s or es) or if the input contained a string that was semi_colon as you'd then have no way to differentiate that from an original ;. In general, when desirable you should map symbols to strings in the output, not in the input.

like image 36
Ed Morton Avatar answered Jan 01 '26 18:01

Ed Morton



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!