Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Stop word removal in Javascript [closed]

HI I am looking for a library that'll remove stop words from text in Javascript, my end goal is to calculate tf-idf and then convert the given document into vector space, and all of this is Javascript. Can anyone point me to a library that'll help me do that.Just a library to remove the stop words would also be great.

like image 480
dhaval2025 Avatar asked Apr 12 '11 06:04

dhaval2025


2 Answers

Use the stopwords provided by the NLTK library:

stopwords = ['i','me','my','myself','we','our','ours','ourselves','you','your','yours','yourself','yourselves','he','him','his','himself','she','her','hers','herself','it','its','itself','they','them','their','theirs','themselves','what','which','who','whom','this','that','these','those','am','is','are','was','were','be','been','being','have','has','had','having','do','does','did','doing','a','an','the','and','but','if','or','because','as','until','while','of','at','by','for','with','about','against','between','into','through','during','before','after','above','below','to','from','up','down','in','out','on','off','over','under','again','further','then','once','here','there','when','where','why','how','all','any','both','each','few','more','most','other','some','such','no','nor','not','only','own','same','so','than','too','very','s','t','can','will','just','don','should','now']

Then simply pass your string into the following function:

function remove_stopwords(str) {
    res = []
    words = str.split(' ')
    for(i=0;i<words.length;i++) {
       word_clean = words[i].split(".").join("")
       if(!stopwords.includes(word_clean)) {
           res.push(word_clean)
       }
    }
    return(res.join(' '))
}  

Example:

remove_stopwords("I will go to the place where there are things for me.")

Result:

I go place things

Just add any words to your NLTK array that aren't covered already.

like image 125
Cybernetic Avatar answered Sep 23 '22 07:09

Cybernetic


I think there are no libraries for such thing, you need to download those words from https://www.ranks.nl/stopwords.

And then do replace the words as follows:

text = text.replace(stopword, "")
like image 23
yura Avatar answered Sep 22 '22 07:09

yura