We can download all nltk data using:
> import nltk
> nltk.download('all')
Or specific data using:
> nltk.download('punkt')
> nltk.download('maxent_treebank_pos_tagger')
But I want to download all data except 'corpara' files, for example - all chunkers, grammers, models, stemmers, taggers, tokenizers, etc
is there any way to do so without Downloader UI? something like,
> nltk.download('all-taggers')
Download individual packages from https://www.nltk.org/nltk_data/ (see the “download” links). Unzip them to the appropriate subfolder.
List all corpora ids and set _status_cache[pkg.id] = 'installed'
.
It will set status value for all corpora as 'installed' and corpora packages will be skipped when we use nltk.download()
.
Instead of downloading all corpora and models, if you're unsure of which corpora/package you need, use nltk.download('popular')
.
import nltk
dwlr = nltk.downloader.Downloader()
for pkg in dwlr.corpora():
dwlr._status_cache[pkg.id] = 'installed'
dwlr.download('popular')
To download all packages of specific folder.
import nltk
dwlr = nltk.downloader.Downloader()
# chunkers, corpora, grammars, help, misc,
# models, sentiment, stemmers, taggers, tokenizers
for pkg in dwlr.packages():
if pkg.subdir== 'taggers':
dwlr.download(pkg.id)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With