Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Download all nltk packages in google colaboratory at once

I want to use stopwords in my code on google colab, there are no errors when I import stuff regarding nltk but when I use stopwords in my code google colab gives this error:-

Resource 'corpora/stopwords.zip/stopwords/' not found.  Please
use the NLTK Downloader to obtain the resource:  >>>
nltk.download()

But when I do:-

 import nltk
 nltk.download()

It gives me all the packages list so I have to select 1 to download, in terminal I could do "all" to download all packages but how an I do that in google colab? I don't want to add a name everytime to download stuff. this is what colab shows me when I do "nltk.download()":-

NLTK Downloader

d) Download l) List u) Update c) Config h) Help q) Quit

 Downloader> d

 Download which package (l=list; x=cancel)?

Is there any way I can download all packages of nltk at once to my project in google colab?

like image 772
Asim Avatar asked Mar 03 '18 15:03

Asim


3 Answers

I reached this page When I faced same problem.
I can use "popular" with this code at google colab.

import nltk
nltk.download("popular")
like image 103
Kazumi Avatar answered Nov 12 '22 07:11

Kazumi


Use:

import nltk

nltk.download('all')

It worked for me.

like image 7
Himanshu kumar Singh Avatar answered Nov 12 '22 05:11

Himanshu kumar Singh


You have several other options:

all-corpora......... All the corpora
all-nltk............ All packages available on nltk_data gh-pages
                           branch
all................. All packages
book................ Everything used in the NLTK Book
popular............. Popular packages
tests............... Packages for running tests

You can use them as :

import nltk
nltk.download('book')
#or
nltk.download('tests')
#or
nltk.download('all-corpora')# not recommended as it download huge amount of data.
like image 5
Krishna Avatar answered Nov 12 '22 06:11

Krishna