I have two text files, File1 looks like this:
apple
dog
cat
..
..
and File2 looks like this:
appledogtree
dog
catapple
apple00001
..
..
I want to count the occurrences of the list of words from File1 in File2, and get a result like below:
(words in File1, number of occurrences in File2)
apple 3
dog 2
cat 1
How can I do this by using Bash command line?
You can use fgrep
to do this efficiently:
fgrep -of f1.txt f2.txt | sort | uniq -c | awk '{print $2 " " $1}'
Gives this output:
apple 3
cat 1
dog 2
fgrep -of f1.txt f2.txt
extracts all the matching parts (-o
option) of f2.txt based on the patterns in f1.txtsort | uniq -c
counts the matching patternsawk
swaps the order of words in uniq -c
outputGiven:
$ cat f1.txt
apple
dog
cat
$ cat f2.txt
appledogtree
dog
catapple
apple00001
Try:
while IFS= read -r line || [[ -n $line ]]; do
printf "%s->%s\n" $line "$(grep -c $line f2.txt)"
done <f1.txt
Prints:
apple->3
dog->2
cat->1
If you want a pipeline, you can do:
cat f1.txt | xargs | sed -e 's/ /\|/g' | grep -Eof /dev/stdin f2.txt | awk '{a[$1]++} END{for (x in a) print x, a[x]}'
Which does:
cat f1.txt
puts the contents of the file to stdin;xargs
translates that to one line;sed -e 's/ /\|/g'
joins the words into "apple|dog|cat"
;grep -Eof /dev/stdin f2.txt
uses that pattern to print the matches of the pattern;awk '{a[$1]++} END{for (x in a) print x, a[x]}'
counts the words and prints the count.With GNU grep, you can do grep -Eof - f2.txt
That pipeline works on POSIX and Linux...
If you want pure efficiency just use awk:
awk 'NR==FNR {pat[FNR]=$1; next}
{for (i in pat){ if(match($0, pat[i])){m[pat[i]]++}}}
END{for(e in m){print e,m[e]}}' f1.txt f2.txt
In awk:
$ awk 'NR==FNR { a[$1]; next } # read in all search words
{ for(i in a) a[i]+=gsub(i,i) } # count matches of all keywords in record
END{ for(i in a) print i,a[i] } # output results
' file1 file2
apple 3
cat 1
dog 2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With