I have file like below : this is a sample file this file will be used for testing <pre class="prettyprint"><code>this is a sample file this file will be used for testing </code></pre> I want to count the words using AWK. the expected output is <pre class="prettyprint"><code>this 2 is 1 a 1 sample 1 file 2 will 1 be 1 used 1 for 1 </code></pre> the below AWK I have written but getting some errors <pre class="prettyprint"><code>cat anyfile.txt|awk -F" "'{for(i=1;i<=NF;i++) a[$i]++} END {for(k in a) print k,a[k]}' </code></pre>

It works fine for me: <pre class="prettyprint"><code>awk '{for(i=1;i<=NF;i++) a[$i]++} END {for(k in a) print k,a[k]}' testfile used 1 this 2 be 1 a 1 for 1 testing 1 file 2 will 1 sample 1 is 1 </code></pre> PS you do not need to set <code>-F" "</code>, since its default any blank. PS2, do not use <code>cat</code> with programs that can read data itself, like <code>awk</code> You can add <code>sort</code> behind code to sort it. <pre class="prettyprint"><code>awk '{for(i=1;i<=NF;i++) a[$i]++} END {for(k in a) print k,a[k]}' testfile | sort -k 2 -n a 1 be 1 for 1 is 1 sample 1 testing 1 used 1 will 1 file 2 this 2 </code></pre>

Instead of looping each line and saving the word in array (<code>{for(i=1;i<=NF;i++) a[$i]++}</code>) use gawk with multi-char RS (Record Separator) definition support option and save each field in array as following(It's a little bit fast): <pre class="prettyprint"><code>gawk '{a[$0]++} END{for (k in a) print k,a[k]}' RS='[[:space:]]+' file </code></pre> Output: <pre class="prettyprint"><code>used 1 this 2 be 1 a 1 for 1 testing 1 file 2 will 1 sample 1 is 1 </code></pre> In above gawk command I defines space-character-class <code>[[:space:]]+</code> (including one or more spaces or <code>\n</code>ew line character) as record separator.

Word Count using AWK

Tags:

awk

I have file like below :

this is a sample file this file will be used for testing

this is a sample file
this file will be used for testing

I want to count the words using AWK.

the expected output is

this 2
is 1
a 1
sample 1
file 2
will 1
be 1
used 1
for 1

the below AWK I have written but getting some errors

cat anyfile.txt|awk -F" "'{for(i=1;i<=NF;i++) a[$i]++} END {for(k in a) print k,a[k]}'

866

asked Feb 20 '15 12:02

Koushik Chandra

2 Answers

It works fine for me:

awk '{for(i=1;i<=NF;i++) a[$i]++} END {for(k in a) print k,a[k]}' testfile
used 1
this 2
be 1
a 1
for 1
testing 1
file 2
will 1
sample 1
is 1

PS you do not need to set -F" ", since its default any blank.
PS2, do not use cat with programs that can read data itself, like awk

You can add sort behind code to sort it.

awk '{for(i=1;i<=NF;i++) a[$i]++} END {for(k in a) print k,a[k]}' testfile | sort -k 2 -n
a 1
be 1
for 1
is 1
sample 1
testing 1
used 1
will 1
file 2
this 2

164

answered Oct 29 '22 06:10

Jotne

Instead of looping each line and saving the word in array ({for(i=1;i<=NF;i++) a[$i]++}) use gawk with multi-char RS (Record Separator) definition support option and save each field in array as following(It's a little bit fast):

gawk '{a[$0]++} END{for (k in a) print k,a[k]}' RS='[[:space:]]+' file

Output:

used 1
this 2
be 1
a 1
for 1
testing 1
file 2
will 1
sample 1
is 1

In above gawk command I defines space-character-class [[:space:]]+ (including one or more spaces or \new line character) as record separator.

answered Oct 29 '22 06:10

αғsнιη

Related questions
                            
                                AWK -Print the next to last field of each line of input file
                            
                                Sorting alphabetically in Bash
                            
                                use grep to extract multiple values from one line
                            
                                creating soft links with the same name as the target file
                            
                                How to pass filename through variable to be read it by awk
                            
                                BASH Script using awk to extract a key
                            
                                associative arrays in awk challenging memory limits
                            
                                bash script to replace all occurrences of placeholders in file
                            
                                How to use multiple passes with gawk?
                            
                                how to pass in a variable to awk commandline
                            
                                extracting specific lines from a text file
                            
                                Can I grep for multiple patterns but have some be inverse? [duplicate]
                            
                                Calculate median of a sliding window with awk
                            
                                Is Awk and multiple file processing possible?
                            
                                How to insert a line in a file between two blocks of known lines (if not already inserted previously), using bash?
                            
                                Replacing specific characters in first column of text
                            
                                awk print vs printf functions
                            
                                Command to replace specific column of csv file for first 100 rows
                            
                                Convert exponentials and rounding numbers in BASH
                            
                                Move column to last in awk

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With