Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Word Count using AWK

Tags:

awk

I have file like below :

this is a sample file this file will be used for testing

this is a sample file
this file will be used for testing

I want to count the words using AWK.

the expected output is

this 2
is 1
a 1
sample 1
file 2
will 1
be 1
used 1
for 1

the below AWK I have written but getting some errors

cat anyfile.txt|awk -F" "'{for(i=1;i<=NF;i++) a[$i]++} END {for(k in a) print k,a[k]}'
like image 866
Koushik Chandra Avatar asked Feb 20 '15 12:02

Koushik Chandra


People also ask

How do I use NF in awk?

NF is a predefined variable whose value is the number of fields in the current record. awk automatically updates the value of NF each time it reads a record. No matter how many fields there are, the last field in a record can be represented by $NF . So, $NF is the same as $7 , which is ' example.


2 Answers

It works fine for me:

awk '{for(i=1;i<=NF;i++) a[$i]++} END {for(k in a) print k,a[k]}' testfile
used 1
this 2
be 1
a 1
for 1
testing 1
file 2
will 1
sample 1
is 1

PS you do not need to set -F" ", since its default any blank.
PS2, do not use cat with programs that can read data itself, like awk

You can add sort behind code to sort it.

awk '{for(i=1;i<=NF;i++) a[$i]++} END {for(k in a) print k,a[k]}' testfile | sort -k 2 -n
a 1
be 1
for 1
is 1
sample 1
testing 1
used 1
will 1
file 2
this 2
like image 164
Jotne Avatar answered Oct 29 '22 06:10

Jotne


Instead of looping each line and saving the word in array ({for(i=1;i<=NF;i++) a[$i]++}) use gawk with multi-char RS (Record Separator) definition support option and save each field in array as following(It's a little bit fast):

gawk '{a[$0]++} END{for (k in a) print k,a[k]}' RS='[[:space:]]+' file

Output:

used 1
this 2
be 1
a 1
for 1
testing 1
file 2
will 1
sample 1
is 1

In above gawk command I defines space-character-class [[:space:]]+ (including one or more spaces or \new line character) as record separator.

like image 41
αғsнιη Avatar answered Oct 29 '22 06:10

αғsнιη