Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to remove stopwords using stanford nlp

I want to parse the document using stanford nlp and remove stopwords from that, so my question is how to remove stopwords using stanford nlp is there any api to remove that, I find StopWords class but I dont know how to use this, please suggest me how to get this?

Thanks

like image 525
user2609542 Avatar asked Jul 25 '13 03:07

user2609542


People also ask

How Stopwords are removed?

To remove stop words from a sentence, you can divide your text into words and then remove the word if it exits in the list of stop words provided by NLTK. In the script above, we first import the stopwords collection from the nltk. corpus module. Next, we import the word_tokenize() method from the nltk.

Why are Stopwords removed in natural language Processing NLP )?

Tasks like text classification do not generally need stop words as the other words present in the dataset are more important and give the general idea of the text. So, we generally remove stop words in such tasks.

Does AA stop word?

The most common SEO stop words are pronouns, articles, prepositions, and conjunctions. This includes words like a, an, the, and, it, for, or, but, in, my, your, our, and their.


2 Answers

I think you can use this annotator to remove stop words https://github.com/jconwell/coreNlp

like image 170
Raju Penumatsa Avatar answered Oct 10 '22 03:10

Raju Penumatsa


If I'm correct the annotator mentioned by @Raju Penumatsa above is accessible on Maven here: https://mvnrepository.com/artifact/com.zensols/stopword-annotator And maintained in another git repo here: https://github.com/plandes/stopword-annotator

With the usage of the Maven repository you can easily use the annotator in your project as a dependency by importing it with a build tool such as Maven or Gradle etc. and you don't have to copy the lib into your classpath manually, so it is easier and more maintainable. The Git repo I linked moved the stopword plugin of the jconwell/coreNlp project into a separate repo and added some additional metadata in order to be able to publish it on Maven Central.

like image 36
gneusch Avatar answered Oct 10 '22 03:10

gneusch