Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

NLTK data out of date - Python 3.4

I'm trying to install NLTK for Python 3.4. The actual NLTK module appears to have installed fine. I then ran

import nltk

nltk.download()

and chose to download everything. However, after it was done, the window simply says 'out of date'. I tried refreshing and downloading, yet it stays 'out of date' as shown here:NLTK Window 1

I looked online and tried various fixes, but I haven't found any that helped my case yet.

I also tried to manually find the missing parts, which turned out to be 'Open Multilingual Wordnet' and 'Wordnet'. Here's how I found which parts were missing: Open Multilingual Wordnet.

What should I do? Should I uninstall and reinstall NLTK? I haven't really found a way to delete the packages (except for manually deleting it).

EDIT: Regarding Solution 2 and Solution 3: For more clarification on the Solution 2 issue:

If something has sucessfully downloaded, this is the output:

>>> nltk.download('subjectivity')
[nltk_data] Downloading package subjectivity to
[nltk_data]     C:\Users\Shane\AppData\Roaming\nltk_data...
[nltk_data]   Package subjectivity is already up-to-date!
True

However, for 'wordnet' and 'omw', this is what happens when I redownload:

>>> nltk.download('omw')
[nltk_data] Downloading package omw to
[nltk_data]     C:\Users\Shane\AppData\Roaming\nltk_data...
[nltk_data]   Unzipping corpora\omw.zip.
True
like image 978
pyman Avatar asked Oct 17 '15 06:10

pyman


1 Answers

In short:

Don't use the GUI, add all packages within the python interpreter.

$ python3
>>> import nltk
>>> nltk.download('all')

In long:

It might be because of the recent addition of Open Multilingual WordNet and something is not working right with the NLTK download GUI interface and the indices.

Solution 1:

Simply use the nltk.download() GUI and download the two packages without selecting all. (May not work but worth the try)

Solution 2:

Install the package individually through the python interpreter:

>>> import nltk
>>> nltk.download('wordnet')
>>> nltk.download('omw') # Open Multilingual WordNet

Solution 3:

Let the nltk.download('all') check through all packages in its index and download them if they're not available.

>>> import nltk
>>> nltk.downlad('all')

Note: If any files was corrupted possibly due to broken internet connection, simply find the directory where NLTK data is stored and then proceed with solution 3.

To find where nltk_data is stored, nltk.data.path stores the possible locations:

>>> import nltk
>>> nltk.data.path
['/home/alvas/nltk_data', '/usr/share/nltk_data', '/usr/local/share/nltk_data', '/usr/lib/nltk_data', '/usr/local/lib/nltk_data']

Since the point of the data download is to use them, to know that you're not missing the components you need, and if that's wordnet and omw, you can try this:

>>> from nltk.corpus import wordnet as wn
>>> wn.synsets('bank')[0]
Synset('bank.n.01')
>>> wn.synsets('bank')[0].lemma_names('spa')
['margen', 'orilla', 'vera']
>>> wn.synsets('bank')[0].lemma_names('fre')
['rive', 'banque']

Don't worry so much as in what is shown on the GUI. Once nltk.download('all') is completed without errors, it means you have all the corpora and models that NLTK supports.

But as a good practice, please raise an issue in https://github.com/nltk/nltk_data/issues so that the developers can check if the problem can be replicated. Show some more printscreen of the error. before and after the proposed solutions too =)

like image 64
alvas Avatar answered Nov 15 '22 18:11

alvas