shell to filter prohibited words on a file

Question

Good day shell lovers!

basically i have two files:

frequency.txt: (multiple lines, space separated file containing words and a frequency)

de 1711
a 936
et 762
la 530
les 482
pour 439
le 425
...

and i have a file containing "prohibited" words:

stopwords.txt: (one single line, space separated file)

 au aux avec le ces dans ...

so i want to delete from frequency.txt all the lines containing a word found on stopwords.txt

how could i do that? i'm thinking that it could be done with awk.. something like

awk 'match($0,SOMETHING_MAGICAL_HERE) == 0 {print $0}' frequency.txt > new.txt

but i'm not really sure... any ideas?? thxs in advance

ghostdog74 · Accepted Answer

$ awk 'FNR==NR{for(i=1;i<=NF;i++)w[$i];next}(!($1 in w))' stop.txt freq.txt
de 1711
a 936
et 762
la 530
les 482
pour 439

Michael Goldshteyn · Answer

This will do it for you:

tr ' ' '
' <stopwords.txt | grep -v -w -F -f - frequency.txt

-v is to invert the match
-w is for whole word matches only
-F is to indicate that pattern is a set of newline separated fixed strings
-f to get the pattern strings from the stopwords.txt file

If you have trouble with that, because it's space delimited, you can use tr to replace spaces with newlines:

shell to filter prohibited words on a file

Tags:

linux

shell

filter

awk

pleasedontbelong

2 Answers

ghostdog74

Michael Goldshteyn

Recent Activity

Donate For Us

shell to filter prohibited words on a file

Tags:

linux

shell

filter

awk

pleasedontbelong

2 Answers

ghostdog74

Michael Goldshteyn

Related questions

Recent Activity

Donate For Us