Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does nltk contain Arabic stop word, if not how can I add it?

Tags:

nltk

arabic

I tried this but it doesn't work

from nltk.corpus import stopwords
stopwords_list = stopwords.words('arabic')
print(stopwords_list)

Update [January 2018]: The nltk data repository has included Arabic stopwords since October, 2017, so this issue no longer arises. The above code will work as expected.

like image 407
lina Avatar asked Dec 14 '22 00:12

lina


1 Answers

As of October, 2017, the nltk includes a collection of Arabic stopwords. If you ran nltk.download() after that date, this issue will not arise. If you have been a user of nltk for some time and you now lack the Arabic stopwords, use nltk.download() to update your stopwords corpus.

  1. If you call nltk.download() without arguments, you'll find that the stopwords corpus is shown as "out of date" (in red). Download the current version that includes Arabic.

  2. Alternately, you can simply update the stopwords corpus by running the following code once, from the interactive prompt:

    >>> import nltk
    >>> nltk.download("stopwords")
    

Note:

Looking words up in a list is really slow. Use a set, not a list. E.g.,

arb_stopwords = set(nltk.corpus.stopwords.words("arabic"))

Original answer (still applicable to languages that are not included)

Why don't you just check what the stopwords collection contains:

>>> from nltk.corpus import stopwords
>>> stopwords.fileids()
['danish', 'dutch', 'english', 'finnish', 'french', 'german', 'hungarian',
 'italian', 'norwegian', 'portuguese', 'russian', 'spanish', 'swedish',
 'turkish']

So no, there's no list for Arabic. I'm not sure what you mean by "add it", but the stopwords lists are just lists of words. They don't even do morphological analysis, or other things you might want in an inflecting language. So if you have (or can put together) a list of Arabic stopwords, just put them in a set()¹ and you're one step ahead of where you'd be if your code worked.

like image 191
alexis Avatar answered May 06 '23 12:05

alexis