Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Match words in word-list and count occurrences

Tags:

grep

bash

list

sed

awk

So I have a general text file with some writing in it, it really ranges randomly, but I also have a wordlist that I want to compare it with and count the occurrences of each word that appears in the text file that is on the word list.

For example my word list can be comprised of this:

good
bad 
cupid
banana
apple

Then I want to compare each of these individual words with my text file which may be like this:

Sometimes I travel to the good places that are good, and never the bad places that are bad. For example I want to visit the heavens and meet a cupid eating an apple. Perhaps I will see mythological creatures eating other fruits like apples, bananas, and other good fruits.

I wish my output to generate how many times each occurrence of the listed words happen. I have a way to do this is awk and a for-loop but I really wish to avoid the for-loop since it will take forever since my real words list is about 10000 words long.

So in this case my output should be (I think) 9 since it counts total occurrences of a word on that list.

By the way, the paragraph was totally random.

like image 529
CrudeCoder Avatar asked Jan 12 '23 18:01

CrudeCoder


2 Answers

For small to medium size texts you could use grep in combination with wc:

cat <<EOF > word.list
good
bad 
cupid
banana
apple
EOF

cat <<EOF > input.txt
Sometimes I travel to the good places that are good, and never the bad places that are bad. For example I want to visit the heavens and meet a cupid eating an apple. Perhaps I will see mythological creatures eating other fruits like apples, bananas, and other good fruits.
EOF

while read search ; do
    echo "$search: $(grep -o $search input.txt | wc -l)" 
done < word.list | awk '{total += $2; print}END{printf "total: %s\n", total}'

Output:

good: 3
bad: 2
cupid: 1
banan: 1
apple: 2
total: 9
like image 168
hek2mgl Avatar answered Jan 20 '23 01:01

hek2mgl


For any bigger text I would definitely use this:

perl -nE'BEGIN{open my$fh,"<",shift;my@a=map lc,map/(\w+)/g,<$fh>;@h{@a}=(0)x@a;close$fh}exists$h{$_}and$h{$_}++for map lc,/(\w+)/g}{for(keys%h){say"$_: $h{$_}";$s+=$h{$_}}say"Total: $s"' word.list input.txt
like image 39
Hynek -Pichi- Vychodil Avatar answered Jan 20 '23 00:01

Hynek -Pichi- Vychodil