Good day shell lovers!
basically i have two files:
frequency.txt: (multiple lines, space separated file containing words and a frequency)
de 1711
a 936
et 762
la 530
les 482
pour 439
le 425
...
and i have a file containing "prohibited" words:
stopwords.txt: (one single line, space separated file)
au aux avec le ces dans ...
so i want to delete from frequency.txt all the lines containing a word found on stopwords.txt
how could i do that? i'm thinking that it could be done with awk.. something like
awk 'match($0,SOMETHING_MAGICAL_HERE) == 0 {print $0}' frequency.txt > new.txt
but i'm not really sure... any ideas?? thxs in advance
$ awk 'FNR==NR{for(i=1;i<=NF;i++)w[$i];next}(!($1 in w))' stop.txt freq.txt
de 1711
a 936
et 762
la 530
les 482
pour 439
This will do it for you:
tr ' ' '\n' <stopwords.txt | grep -v -w -F -f - frequency.txt
-v is to invert the match
-w is for whole word matches only
-F is to indicate that pattern is a set of newline separated fixed strings
-f to get the pattern strings from the stopwords.txt file
If you have trouble with that, because it's space delimited, you can use tr to replace spaces with newlines:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With