Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to install NLTK data in windows (Anaconda)

I need some NLTK data packages in my code. I tried installing it from below command, but It installs all the packages that I do not need

conda install -c conda-forge nltk_data

How can I install specific NLTK data packages like stopwords, punkt, etc.

like image 650
arush1836 Avatar asked Jul 20 '18 09:07

arush1836


People also ask

How do I import NLTK in Anaconda?

Through Anaconda First, to install Anaconda, go to the link www.anaconda.com/distribution/#download-section and then select the version of Python you need to install. You need to review the output and enter 'yes'. NLTK will be downloaded and installed in your Anaconda package.

Where do I put NLTK data?

The recommended system location is C:\nltk_data (Windows); /usr/local/share/nltk_data (Mac); and /usr/share/nltk_data (Unix). You can use the -d flag to specify a different location (but if you do this, be sure to set the NLTK_DATA environment variable accordingly).

How do I install NLTK packages?

NLTK Tutorials The following steps are from Installing NLTK: Install Setuptools: http://pypi.python.org/pypi/setuptools. Install Pip: run sudo easy_install pip. Install Numpy (optional): run sudo pip install -U numpy.


2 Answers

After installing nltk using pip,run the following code in ipython

import nltk
nltk.download()

After this you will get a GUI where you can download all the data

If you want specific download, you can do that too. GUI looks as shown belowenter image description here

like image 126
InAFlash Avatar answered Nov 11 '22 22:11

InAFlash


From the NLTK documentation:

Run the Python interpreter and type the commands:

import nltk
nltk.download()

A new window will pop up where you can select the packages that you wish to install.

Alternatively, you can use

python -m nltk.downloader <collection|package|all>

to install the package or collection you want, or use all to install all of them.

Here is a list of the packages and collections that you can use in this command, extracted from nltk_data gh-pages.

Packages

  • maxent_ne_chunker
  • abc
  • alpino
  • biocreative_ppi
  • brown
  • brown_tei
  • cess_cat
  • cess_esp
  • chat80
  • city_database
  • cmudict
  • comparative_sentences
  • comtrans
  • conll2000
  • conll2002
  • conll2007
  • crubadan
  • dependency_treebank
  • dolch
  • europarl_raw
  • floresta
  • framenet_v15
  • framenet_v17
  • gazetteers
  • genesis
  • gutenberg
  • ieer
  • inaugural
  • indian
  • jeita
  • kimmo
  • knbc
  • lin_thesaurus
  • mac_morpho
  • machado
  • masc_tagged
  • movie_reviews
  • mte_teip5
  • names
  • nombank.1.0
  • nonbreaking_prefixes
  • nps_chat
  • omw
  • opinion_lexicon
  • panlex_swadesh
  • paradigms
  • pe08
  • pil
  • pl196x
  • ppattach
  • problem_reports
  • product_reviews_1
  • product_reviews_2
  • propbank
  • pros_cons
  • ptb
  • qc
  • reuters
  • rte
  • semcor
  • senseval
  • sentence_polarity
  • sentiwordnet
  • shakespeare
  • sinica_treebank
  • smultron
  • state_union
  • stopwords
  • subjectivity
  • swadesh
  • switchboard
  • timit
  • toolbox
  • treebank
  • twitter_samples
  • udhr
  • udhr2
  • unicode_samples
  • universal_treebanks_v20
  • verbnet
  • webtext
  • wordnet
  • wordnet_ic
  • words
  • ycoe
  • basque_grammars
  • book_grammars
  • large_grammars
  • sample_grammars
  • spanish_grammars
  • tagsets
  • mwa_ppdb
  • perluniprops
  • bllip_wsj_no_aux
  • moses_sample
  • wmt15_eval
  • word2vec_sample
  • vader_lexicon
  • porter_test
  • rslp
  • snowball_data
  • averaged_perceptron_tagger
  • averaged_perceptron_tagger_ru
  • maxent_treebank_pos_tagger
  • universal_tagset
  • punkt

Collections and the packages contained within them

  • all-corpora
    • abc
    • alpino
    • biocreative_ppi
    • brown
    • brown_tei
    • cess_cat
    • cess_esp
    • chat80
    • city_database
    • cmudict
    • comtrans
    • conll2000
    • conll2002
    • conll2007
    • crubadan
    • dependency_treebank
    • dolch
    • floresta
    • framenet_v15
    • framenet_v17
    • gazetteers
    • genesis
    • gutenberg
    • ieer
    • inaugural
    • indian
    • jeita
    • kimmo
    • knbc
    • lin_thesaurus
    • mac_morpho
    • machado
    • masc_tagged
    • movie_reviews
    • names
    • nombank.1.0
    • nps_chat
    • omw
    • paradigms
    • pil
    • pl196x
    • ppattach
    • problem_reports
    • propbank
    • ptb
    • qc
    • reuters
    • rte
    • semcor
    • senseval
    • sentiwordnet
    • shakespeare
    • sinica_treebank
    • state_union
    • stopwords
    • swadesh
    • switchboard
    • timit
    • toolbox
    • treebank
    • udhr
    • udhr2
    • unicode_samples
    • universal_treebanks_v20
    • verbnet
    • webtext
    • wordnet
    • wordnet_ic
    • words
    • ycoe
    • panlex_swadesh
    • mte_teip5
    • nonbreaking_prefixes
  • all-nltk
    • abc
    • alpino
    • biocreative_ppi
    • brown
    • brown_tei
    • cess_cat
    • cess_esp
    • chat80
    • city_database
    • cmudict
    • comparative_sentences
    • comtrans
    • conll2000
    • conll2002
    • conll2007
    • crubadan
    • dependency_treebank
    • europarl_raw
    • floresta
    • framenet_v15
    • framenet_v17
    • gazetteers
    • genesis
    • gutenberg
    • ieer
    • inaugural
    • indian
    • jeita
    • kimmo
    • knbc
    • lin_thesaurus
    • mac_morpho
    • machado
    • masc_tagged
    • moses_sample
    • movie_reviews
    • names
    • nombank.1.0
    • nps_chat
    • omw
    • opinion_lexicon
    • paradigms
    • pil
    • pl196x
    • ppattach
    • problem_reports
    • propbank
    • ptb
    • product_reviews_1
    • product_reviews_2
    • pros_cons
    • qc
    • reuters
    • rte
    • semcor
    • senseval
    • sentiwordnet
    • sentence_polarity
    • shakespeare
    • sinica_treebank
    • smultron
    • state_union
    • stopwords
    • subjectivity
    • swadesh
    • switchboard
    • timit
    • toolbox
    • treebank
    • twitter_samples
    • udhr
    • udhr2
    • unicode_samples
    • universal_treebanks_v20
    • verbnet
    • webtext
    • wordnet
    • wordnet_ic
    • words
    • ycoe
    • rslp
    • maxent_treebank_pos_tagger
    • universal_tagset
    • maxent_ne_chunker
    • punkt
    • book_grammars
    • sample_grammars
    • spanish_grammars
    • basque_grammars
    • large_grammars
    • tagsets
    • snowball_data
    • bllip_wsj_no_aux
    • word2vec_sample
    • panlex_swadesh
    • mte_teip5
    • averaged_perceptron_tagger
    • perluniprops
    • nonbreaking_prefixes
    • vader_lexicon
    • porter_test
    • wmt15_eval
    • mwa_ppdb
  • all
    • abc
    • alpino
    • biocreative_ppi
    • brown
    • brown_tei
    • cess_cat
    • cess_esp
    • chat80
    • city_database
    • cmudict
    • comparative_sentences
    • comtrans
    • conll2000
    • conll2002
    • conll2007
    • crubadan
    • dependency_treebank
    • dolch
    • europarl_raw
    • floresta
    • framenet_v15
    • framenet_v17
    • gazetteers
    • genesis
    • gutenberg
    • ieer
    • inaugural
    • indian
    • jeita
    • kimmo
    • knbc
    • lin_thesaurus
    • mac_morpho
    • machado
    • masc_tagged
    • moses_sample
    • movie_reviews
    • names
    • nombank.1.0
    • nps_chat
    • omw
    • opinion_lexicon
    • paradigms
    • pil
    • pl196x
    • ppattach
    • problem_reports
    • propbank
    • ptb
    • product_reviews_1
    • product_reviews_2
    • pros_cons
    • qc
    • reuters
    • rte
    • semcor
    • senseval
    • sentiwordnet
    • sentence_polarity
    • shakespeare
    • sinica_treebank
    • smultron
    • state_union
    • stopwords
    • subjectivity
    • swadesh
    • switchboard
    • timit
    • toolbox
    • treebank
    • twitter_samples
    • udhr
    • udhr2
    • unicode_samples
    • universal_treebanks_v20
    • verbnet
    • webtext
    • wordnet
    • wordnet_ic
    • words
    • ycoe
    • rslp
    • maxent_treebank_pos_tagger
    • universal_tagset
    • maxent_ne_chunker
    • punkt
    • book_grammars
    • sample_grammars
    • spanish_grammars
    • basque_grammars
    • large_grammars
    • tagsets
    • snowball_data
    • bllip_wsj_no_aux
    • word2vec_sample
    • panlex_swadesh
    • mte_teip5
    • averaged_perceptron_tagger
    • perluniprops
    • nonbreaking_prefixes
    • vader_lexicon
    • porter_test
    • wmt15_eval
    • mwa_ppdb
  • book
    • abc
    • brown
    • chat80
    • cmudict
    • conll2000
    • conll2002
    • dependency_treebank
    • genesis
    • gutenberg
    • ieer
    • inaugural
    • movie_reviews
    • nps_chat
    • names
    • ppattach
    • reuters
    • senseval
    • state_union
    • stopwords
    • swadesh
    • timit
    • treebank
    • toolbox
    • udhr
    • udhr2
    • unicode_samples
    • webtext
    • wordnet
    • wordnet_ic
    • words
    • maxent_treebank_pos_tagger
    • maxent_ne_chunker
    • universal_tagset
    • punkt
    • book_grammars
    • city_database
    • tagsets
    • panlex_swadesh
    • averaged_perceptron_tagger
  • popular
    • cmudict
    • gazetteers
    • genesis
    • gutenberg
    • inaugural
    • movie_reviews
    • names
    • shakespeare
    • stopwords
    • treebank
    • twitter_samples
    • omw
    • wordnet
    • wordnet_ic
    • words
    • maxent_ne_chunker
    • punkt
    • snowball_data
    • averaged_perceptron_tagger
  • tests
    • averaged_perceptron_tagger
    • porter_test
    • twitter_samples
    • wmt15_eval
    • subjectivity
    • framenet_v17
    • product_reviews_1
    • product_reviews_2
    • vader_lexicon
    • crubadan
    • mte_teip5
    • sentence_polarity
    • universal_treebanks_v20
    • panlex_swadesh
    • nonbreaking_prefixes
    • perluniprops
    • pros_cons
    • opinion_lexicon
    • comparative_sentences
  • third-party
    • dolch
like image 4
Bram Vanroy Avatar answered Nov 12 '22 00:11

Bram Vanroy