Updated answer:NLTK works for 2.7 well. I had 3.2. I uninstalled 3.2 and installed 2.7. Now it works!!
I have installed NLTK and tried to download NLTK Data. What I did was to follow the instrution on this site: http://www.nltk.org/data.html
I downloaded NLTK, installed it, and then tried to run the following code:
>>> import nltk >>> nltk.download()
It gave me the error message like below:
Traceback (most recent call last): File "<pyshell#6>", line 1, in <module> nltk.download() AttributeError: 'module' object has no attribute 'download' Directory of C:\Python32\Lib\site-packages
Tried both nltk.download()
and nltk.downloader()
, both gave me error messages.
Then I used help(nltk)
to pull out the package, it shows the following info:
NAME nltk PACKAGE CONTENTS align app (package) book ccg (package) chat (package) chunk (package) classify (package) cluster (package) collocations corpus (package) data decorators downloader draw (package) examples (package) featstruct grammar help inference (package) internals lazyimport metrics (package) misc (package) model (package) parse (package) probability sem (package) sourcedstring stem (package) tag (package) test (package) text tokenize (package) toolbox tree treetransforms util yamltags FILE c:\python32\lib\site-packages\nltk
I do see Downloader there, not sure why it does not work. Python 3.2.2, system Windows vista.
This can be done easily on Linux using SSH. For windows, we have something similar, called PsExec. Step 1: Download PsExec First download the program at https://docs.microsoft.com/en-us/sysinternals/downloads/psexec. Step 2: Grant access for remote execution Just in case you see “Access is Denied” when…
The argument to nltk. download() is not a file or module, but a resource id that maps to a corpus, machine-learning model or other resource (or collection of resources) to be installed in your NLTK_DATA area. You can see a list of the available resources, and their IDs, at http://www.nltk.org/nltk_data/ .
Install Numpy (optional): run sudo pip install -U numpy. Install NLTK: run sudo pip install -U nltk. Test installation: run python then type import nltk.
To download a particular dataset/models, use the nltk.download()
function, e.g. if you are looking to download the punkt
sentence tokenizer, use:
$ python3 >>> import nltk >>> nltk.download('punkt')
If you're unsure of which data/model you need, you can start out with the basic list of data + models with:
>>> import nltk >>> nltk.download('popular')
It will download a list of "popular" resources, these includes:
<collection id="popular" name="Popular packages"> <item ref="cmudict" /> <item ref="gazetteers" /> <item ref="genesis" /> <item ref="gutenberg" /> <item ref="inaugural" /> <item ref="movie_reviews" /> <item ref="names" /> <item ref="shakespeare" /> <item ref="stopwords" /> <item ref="treebank" /> <item ref="twitter_samples" /> <item ref="omw" /> <item ref="wordnet" /> <item ref="wordnet_ic" /> <item ref="words" /> <item ref="maxent_ne_chunker" /> <item ref="punkt" /> <item ref="snowball_data" /> <item ref="averaged_perceptron_tagger" /> </collection>
In case anyone is avoiding errors from downloading larger datasets from nltk
, from https://stackoverflow.com/a/38135306/610569
$ rm /Users/<your_username>/nltk_data/corpora/panlex_lite.zip $ rm -r /Users/<your_username>/nltk_data/corpora/panlex_lite $ python >>> import nltk >>> dler = nltk.downloader.Downloader() >>> dler._update_index() >>> dler._status_cache['panlex_lite'] = 'installed' # Trick the index to treat panlex_lite as it's already installed. >>> dler.download('popular')
From v3.2.5, NLTK has a more informative error message when nltk_data
resource is not found, e.g.:
>>> from nltk import word_tokenize >>> word_tokenize('x') Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/Users/l/alvas/git/nltk/nltk/tokenize/__init__.py", line 128, in word_tokenize sentences = [text] if preserve_line else sent_tokenize(text, language) File "/Users//alvas/git/nltk/nltk/tokenize/__init__.py", line 94, in sent_tokenize tokenizer = load('tokenizers/punkt/{0}.pickle'.format(language)) File "/Users/alvas/git/nltk/nltk/data.py", line 820, in load opened_resource = _open(resource_url) File "/Users/alvas/git/nltk/nltk/data.py", line 938, in _open return find(path_, path + ['']).open() File "/Users/alvas/git/nltk/nltk/data.py", line 659, in find raise LookupError(resource_not_found) LookupError: ********************************************************************** Resource punkt not found. Please use the NLTK Downloader to obtain the resource: >>> import nltk >>> nltk.download('punkt') Searched in: - '/Users/alvas/nltk_data' - '/usr/share/nltk_data' - '/usr/local/share/nltk_data' - '/usr/lib/nltk_data' - '/usr/local/lib/nltk_data' - '' **********************************************************************
To find nltk_data
directory (auto-magically), see https://stackoverflow.com/a/36383314/610569
To download nltk_data
to a different path, see https://stackoverflow.com/a/48634212/610569
To config nltk_data
path (i.e. set a different path for NLTK to find nltk_data
), see https://stackoverflow.com/a/22987374/610569
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With