Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

downloading error using nltk.download()

I am experimenting NLTK package using Python. I tried to downloaded NLTK using nltk.download(). I got this kind of error message. How to solve this problem? Thanks.

The system I used is Ubuntu installed under VMware. The IDE is Spyder.

enter image description here

After using nltk.download('all'), it can download some packages, but it gets error message when downloading oanc_masc

enter image description here

like image 789
user288609 Avatar asked Dec 26 '14 14:12

user288609


People also ask

What does NLTK download () do?

downloader module. The NLTK corpus and module downloader. This module defines several interfaces which can be used to download corpora, models, and other data packages that can be used with NLTK.

What is NLTK download (' Wordnet ')?

The argument to nltk. download() is not a file or module, but a resource id that maps to a corpus, machine-learning model or other resource (or collection of resources) to be installed in your NLTK_DATA area. You can see a list of the available resources, and their IDs, at http://www.nltk.org/nltk_data/ .

How do I download resources from NLTK?

Download individual packages from https://www.nltk.org/nltk_data/ (see the “download” links). Unzip them to the appropriate subfolder. For example, the Brown Corpus, found at: https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/packages/corpora/brown.zip is to be unzipped to nltk_data/corpora/brown .


3 Answers

To download a particular dataset/models, use the nltk.download() function, e.g. if you are looking to download the punkt sentence tokenizer, use:

$ python3
>>> import nltk
>>> nltk.download('punkt')

If you're unsure of which data/model you need, you can start out with the basic list of data + models with:

>>> import nltk
>>> nltk.download('popular')

It will download a list of "popular" resources.

Ensure that you've the latest version of NLTK because it's always improving and constantly maintain:

$ pip install --upgrade nltk

EDITED

In case anyone is avoiding errors from downloading larger datasets from nltk, from https://stackoverflow.com/a/38135306/610569

$ rm /Users/<your_username>/nltk_data/corpora/panlex_lite.zip
$ rm -r /Users/<your_username>/nltk_data/corpora/panlex_lite
$ python

>>> import nltk
>>> dler = nltk.downloader.Downloader()
>>> dler._update_index()
>>> dler._status_cache['panlex_lite'] = 'installed' # Trick the index to treat panlex_lite as it's already installed.
>>> dler.download('popular')

And if anyone wants to find nltk_data directory, see https://stackoverflow.com/a/36383314/610569

And to config nltk_data path, see https://stackoverflow.com/a/22987374/610569

like image 99
alvas Avatar answered Oct 20 '22 00:10

alvas


From command line, after importing nltk, try

nltk.download('popular', halt_on_error=False)

After an error it will ask to retry broken package, just decline with n and it will continue with proper packages.

like image 26
tolgayilmaz Avatar answered Oct 20 '22 01:10

tolgayilmaz


a) in OSX either run

sudo /Applications/Python\ 3.6/Install\ Certificates.command

b) switch to admin user (the one you have set up with administrator privileges)

and type at command line:

/Applications/Python\ 3.6/Install\ Certificates.command

Notes:

  • "\" are necessary because they escape blank characters in file names.
  • This procedure worked if you have python 3.6 installed, otherwise change it in order to match your install python version... for this execute:

ls /Applications

and look at the python directory name you have there.

like image 1
Alexandre Avatar answered Oct 20 '22 02:10

Alexandre