I have two files.
file1.txt:
Afghans
Africans
Alaskans
...
where file2.txt
contains the output from a wget on a webpage, so it's a big sloppy mess, but does contain many of the words from the first list.
Bashscript:
cat file1.txt | while read LINE; do grep $LINE file2.txt; done
This did not work as expected. I wondered why, so I echoed out the $LINE variable inside the loop and added a sleep 1, so i could see what was happening:
cat file1.txt | while read LINE; do echo $LINE; sleep 1; grep $LINE file2.txt; done
The output looks in terminal looks something like this:
Afghans
Africans
Alaskans
Albanians
Americans
grep: Chinese: No such file or directory
: No such file or directory
Arabians
Arabs
Arabs/East Indians
: No such file or directory
Argentinans
Armenians
Asian
Asian Indians
: No such file or directory
file2.txt: Asian Naruto
...
So you can see it did finally find the word "Asian". But why does it say:
No such file or directory
?
Is there something weird going on or am I missing something here?
What about
grep -f file1.txt file2.txt
@OP, First, use dos2unix
as advised. Then use awk
awk 'FNR==NR{a[$1];next}{ for(i=1;i<=NF;i++){ if($i in a) {print $i} } } ' file1 file2_wget
Note: using while loop and grep inside the loop is not efficient, since for every iteration, you need to invoke grep
on the file2.
@OP, crude explanation:
For meaning of FNR and NR, please refer to gawk manual. FNR==NR{a[1];next}
means getting the contents of file1 into array a
. when FNR is not equal to NR (which means reading the 2nd file now), it will check if each word in the file is in array a
. If it is, print out. (the for loop is used to iterate each word)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With