Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I remove the stop words from sentence using shell script? [duplicate]

Tags:

bash

shell

sed

tr

I'm trying to remove stop words from sentences in file?

Stop Word which I mean :
[I, a, an, as, at, the, by, in, for, of, on, that]

I have these sentences in file my_text.txt :

One of the primary goals in the design of the Unix system was to create an environment that promoted efficient program

Then I want to remove stop word form the sentence above

I used this script :

array=( I a an as at the by in for of on that  )
for i in "${array[@]}"
do
cat $p  | sed -e 's/\<$i\>//g' 
done < my_text.txt

But the output is:

One of the primary goals in the design of the Unix system was to create an environment that promoted efficient program

The expected output should be :

One primary goals design Unix system was to create an environment promoted efficient program

Note: I want to Delete Remove stop words not duplicated words?

like image 244
Abdallah_98 Avatar asked Dec 16 '20 22:12

Abdallah_98


1 Answers

Like this, assuming $p is an existing file:

 sed -i -e "s/\<$i\>//g" "$p"

You have to use double quotes, not single quotes to get variables expanded.

The -i switch replace in line.

Learn how to quote properly in shell, it's very important :

"Double quote" every literal that contains spaces/metacharacters and every expansion: "$var", "$(command "$var")", "${array[@]}", "a & b". Use 'single quotes' for code or literal $'s: 'Costs $5 US', ssh host 'echo "$HOSTNAME"'. See
http://mywiki.wooledge.org/Quotes
http://mywiki.wooledge.org/Arguments
http://wiki.bash-hackers.org/syntax/words

Finally

array=( I a an as at the by in for of on that  )
for i in "${array[@]}"
do
    sed -i -e "s/\<$i\>\s*//g" Input_File 
done

Bonus

Try without \s* to understand why I added this regex

like image 73
Gilles Quenot Avatar answered Sep 20 '22 17:09

Gilles Quenot