Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pyspark : how to configure StopWordsRemover with french language on spark 1.6.3

I would like to know how to configure stopwordsremover with french language in spark 1.6.3.

I'm currently using pyspark.

Thanks for your help.

Best regards,

like image 775
nassimlaga Avatar asked Oct 23 '25 22:10

nassimlaga


1 Answers

Take a look at the nltk package

I use it for portuguese words:

from pyspark.ml.feature import StopWordsRemover
import nltk
nltk.download("stopwords")

...

stopwordList = nltk.corpus.stopwords.words('portuguese')
remover = StopWordsRemover(inputCol=tokenizer.getOutputCol(), outputCol="stopWordsRem", stopWords=stopwordList)

Hope it helps

like image 175
André Machado Avatar answered Oct 26 '25 17:10

André Machado



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!