I tried this but it doesn't work
from nltk.corpus import stopwords
stopwords_list = stopwords.words('arabic')
print(stopwords_list)
Update [January 2018]: The nltk data repository has included Arabic stopwords since October, 2017, so this issue no longer arises. The above code will work as expected.
As of October, 2017, the nltk includes a collection of Arabic stopwords. If you ran nltk.download()
after that date, this issue will not arise. If you have been a user of nltk for some time and you now lack the Arabic stopwords, use nltk.download()
to update your stopwords corpus.
If you call nltk.download()
without arguments, you'll find that the stopwords
corpus is shown as "out of date" (in red). Download the current version that includes Arabic.
Alternately, you can simply update the stopwords corpus by running the following code once, from the interactive prompt:
>>> import nltk
>>> nltk.download("stopwords")
Note:
Looking words up in a list is really slow. Use a set, not a list. E.g.,
arb_stopwords = set(nltk.corpus.stopwords.words("arabic"))
Why don't you just check what the stopwords
collection contains:
>>> from nltk.corpus import stopwords
>>> stopwords.fileids()
['danish', 'dutch', 'english', 'finnish', 'french', 'german', 'hungarian',
'italian', 'norwegian', 'portuguese', 'russian', 'spanish', 'swedish',
'turkish']
So no, there's no list for Arabic. I'm not sure what you mean by "add it", but the stopwords lists are just lists of words. They don't even do morphological analysis, or other things you might want in an inflecting language. So if you have (or can put together) a list of Arabic stopwords, just put them in a set()
¹ and you're one step ahead of where you'd be if your code worked.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With