NLTK and Stopwords Fail #lookuperror

Question

I am trying to start a project of sentiment analysis and I will use the stop words method. I made some research and I found that nltk have stopwords but when I execute the command there is an error.

What I do is the following, in order to know which are the words that nltk use (like what you may found here http://www.nltk.org/book/ch02.html in section4.1):

from nltk.corpus import stopwords
stopwords.words('english')

But when I press enter I obtain

---------------------------------------------------------------------------
LookupError                               Traceback (most recent call last)
<ipython-input-6-ff9cd17f22b2> in <module>()
----> 1 stopwords.words('english')

C:\Users\Usuario\Anaconda\lib\site-packages
ltk\corpus\util.pyc in __getattr__(self, attr)
 66
 67     def __getattr__(self, attr):
---> 68         self.__load()
 69         # This looks circular, but its not, since __load() changes our
 70         # __class__ to something new:

C:\Users\Usuario\Anaconda\lib\site-packages
ltk\corpus\util.pyc in __load(self)
 54             except LookupError, e:
 55                 try: root = nltk.data.find('corpora/%s' % zip_name)
---> 56                 except LookupError: raise e
 57
 58         # Load the corpus.

LookupError:
**********************************************************************
  Resource 'corpora/stopwords' not found.  Please use the NLTK
  Downloader to obtain the resource:  >>> nltk.download()
  Searched in:
- 'C:\Users\Meru/nltk_data'
- 'C:\nltk_data'
- 'D:\nltk_data'
- 'E:\nltk_data'
- 'C:\Users\Meru\Anaconda\nltk_data'
- 'C:\Users\Meru\Anaconda\lib\nltk_data'
- 'C:\Users\Meru\AppData\Roaming\nltk_data'
**********************************************************************

And, because of this problem things like this cannot run properly (obtaining the same error):

>>> from nltk.corpus import stopwords
>>> stop = stopwords.words('english')
>>> sentence = "this is a foo bar sentence"
>>> print [i for i in sentence.split() if i not in stop]

Do you know what may be problem? I must use words in Spanish, do you recomend another method? I also thought using Goslate package with datasets in english

Thanks for reading!

P.D.: I use Ananconda

tttthomasssss · Accepted Answer

You don't seem to have the stopwords corpus on your computer.

You need to start the NLTK Downloader and download all the data you need.

Open a Python console and do the following:

>>> import nltk >>> nltk.download() showing info http://nltk.github.com/nltk_data/

In the GUI window that opens simply press the 'Download' button to download all corpora or go to the 'Corpora' tab and only download the ones you need/want.

Abu Shoeb · Answer

I tried from ubuntu terminal and I don't know why the GUI didn't show up according to tttthomasssss answer. So I followed the comment from KLDavenport and it worked. Here is the summary:

Open your terminal/command-line and type python then

>>> import nltk .>>> nltk.download("stopwords")

This will store the stopwords corpus under the nltk_data. For my case it was /home/myusername/nltk_data/corpora/stopwords.

If you need another corpus then visit nltk data and find the corpus with their ID. Then use the ID to download like we did for stopwords.

Haseeb · Answer

import nltk
nltk.download('stopwords')
from nltk.corpus import stopwords
STOPWORDS = set(stopwords.words('english'))

NLTK and Stopwords Fail #lookuperror

Tags:

python

nltk

stop-words

sentiment-analysis

Facundo

3 Answers

tttthomasssss

Abu Shoeb

Haseeb

Recent Activity

Donate For Us

NLTK and Stopwords Fail #lookuperror

Tags:

python

nltk

stop-words

sentiment-analysis

Facundo

3 Answers

tttthomasssss

Abu Shoeb

Haseeb

Related questions

Recent Activity

Donate For Us