In one text file, I have 150 words. I have another text file, which has about 100,000 lines.
How can I check for each of the words belonging to the first file whether it is in the second or not?
I thought about using grep
, but I could not find out how to use it to read each of the words in the original text.
Is there any way to do this using awk
? Or another solution?
I tried with this shell script, but it matches almost every line:
#!/usr/bin/env sh
cat words.txt | while read line; do
if grep -F "$FILENAME" text.txt
then
echo "Se encontró $line"
fi
done
Another way I found is:
fgrep -w -o -f "words.txt" "text.txt"
The grep command searches through the file, looking for matches to the pattern specified. To use it type grep , then the pattern we're searching for and finally the name of the file (or files) we're searching in.
Grep is a pattern matching command that we can use to search inside files and directories for specific text. Grep is commonly used with the output of one command, piped to be the input of the grep command.
In Linux and Unix Systems Grep, short for “global regular expression print”, is a command used in searching and matching text files contained in the regular expressions.
You can use grep -f
:
grep -Ff "first-file" "second-file"
OR else to match full words:
grep -w -Ff "first-file" "second-file"
UPDATE: As per the comments:
awk 'FNR==NR{a[$1]; next} ($1 in a){delete a[$1]; print $1}' file1 file2
Use grep like this:
grep -f firstfile secondfile
SECOND OPTION
Thank you to Ed Morton for pointing out that the words in the file "reserved" are treated as patterns. If that is an issue - it may or may not be - the OP can maybe use something like this which doesn't use patterns:
File "reserved"
cat
dog
fox
and file "text"
The cat jumped over the lazy
fox but didn't land on the
moon at all.
However it did land on the dog!!!
Awk script is like this:
awk 'BEGIN{i=0}FNR==NR{res[i++]=$1;next}{for(j=0;j<i;j++)if(index($0,res[j]))print $0}' reserved text
with output:
The cat jumped over the lazy
fox but didn't land on the
However it did land on the dog!!!
THIRD OPTION
Alternatively, it can be done quite simply, but more slowly in bash:
while read r; do grep $r secondfile; done < firstfile
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With