I am experimenting NLTK package using Python. I tried to downloaded NLTK using nltk.download()
. I got this kind of error message. How to solve this problem? Thanks.
The system I used is Ubuntu installed under VMware. The IDE is Spyder.
After using nltk.download('all')
, it can download some packages, but it gets error message when downloading oanc_masc
downloader module. The NLTK corpus and module downloader. This module defines several interfaces which can be used to download corpora, models, and other data packages that can be used with NLTK.
The argument to nltk. download() is not a file or module, but a resource id that maps to a corpus, machine-learning model or other resource (or collection of resources) to be installed in your NLTK_DATA area. You can see a list of the available resources, and their IDs, at http://www.nltk.org/nltk_data/ .
Download individual packages from https://www.nltk.org/nltk_data/ (see the “download” links). Unzip them to the appropriate subfolder. For example, the Brown Corpus, found at: https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/packages/corpora/brown.zip is to be unzipped to nltk_data/corpora/brown .
To download a particular dataset/models, use the nltk.download()
function, e.g. if you are looking to download the punkt
sentence tokenizer, use:
$ python3
>>> import nltk
>>> nltk.download('punkt')
If you're unsure of which data/model you need, you can start out with the basic list of data + models with:
>>> import nltk
>>> nltk.download('popular')
It will download a list of "popular" resources.
Ensure that you've the latest version of NLTK
because it's always improving and constantly maintain:
$ pip install --upgrade nltk
In case anyone is avoiding errors from downloading larger datasets from nltk
, from https://stackoverflow.com/a/38135306/610569
$ rm /Users/<your_username>/nltk_data/corpora/panlex_lite.zip
$ rm -r /Users/<your_username>/nltk_data/corpora/panlex_lite
$ python
>>> import nltk
>>> dler = nltk.downloader.Downloader()
>>> dler._update_index()
>>> dler._status_cache['panlex_lite'] = 'installed' # Trick the index to treat panlex_lite as it's already installed.
>>> dler.download('popular')
And if anyone wants to find nltk_data
directory, see https://stackoverflow.com/a/36383314/610569
And to config nltk_data
path, see https://stackoverflow.com/a/22987374/610569
From command line, after importing nltk, try
nltk.download('popular', halt_on_error=False)
After an error it will ask to retry broken package, just decline with n and it will continue with proper packages.
a) in OSX either run
sudo /Applications/Python\ 3.6/Install\ Certificates.command
b) switch to admin user (the one you have set up with administrator privileges)
and type at command line:
/Applications/Python\ 3.6/Install\ Certificates.command
Notes:
ls /Applications
and look at the python directory name you have there.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With