Is stopwords removal ,Stemming and Lemmatization necessary for text classification while using Spacy,Bert or other advanced NLP models for getting the vector embedding of the text ?
text="The food served in the wedding was very delicious"
1.since Spacy,Bert were trained on huge raw datasets are there any benefits of apply stopwords removal ,Stemming and Lemmatization on these text before generating the embedding using bert/spacy for text classification task ?
2.I can understand stopwords removal ,Stemming and Lemmatization will be good when we use countvectorizer,tfidf vectorizer to get embedding of sentences .
You can test to see if doing stemming lemmatization and stopword removal helps. It doesn't always. I usually do if I gonna graph as the stopwords clutter up the results.
A case for not using Stopwords Using Stopwords will provide context to the user's intent, so when you use a contextual model like BERT. In such models like BERT, all stopwords are kept to provide enough context information like the negation words (not, nor, never) which are considered to be stopwords.
According to https://arxiv.org/pdf/1904.07531.pdf
"Surprisingly, the stopwords received as much attention as non-stop words, but removing them has no effect inMRR performances. "
With BERT you don't process the texts; otherwise, you lose the context (stemming, lemmatization) or change the texts outright (stop words removal).
Some more basic models (rule-based or bag-of-words) would benefit from some processing, but you must be very careful with stop words removal: many words that change the meaning of an entire sentence are stop words (not, no, never, unless).
Do not remove SW, as they add new information(context-awareness) to the sentence (viz., text summarization, machine/language translation, language modeling, question-answering)
Remove SW if we want only general idea of the sentence (viz., sentiment analysis, language/text classification, spam filtering, caption generation, auto-tag generation, topic/document
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With