Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Not able to download nltk data for framenet_v15

Tags:

download

nltk

I am trying to download all the data package for the nltk. But it is always failing while trying to download framenet_v15. It simply hangs there.

Tried multiple times from the same machine. Each time almost left for 30 mins and once more than one hour. Tried to replace the source server to google svn, but downloader gave an error.

Unfortunately, I don't have any other information. Is there way to figure what the problem is? Or is there any alternate source from where I can download the nltk data?

Thanks.

Edit:

finally downloaded with wget -c, it took lot of retries before finally completing the download.

Some observations

  1. After some some amount of data is downloaded, the connection goes to freeze. The server is not reachable by ping.
  2. The downloadable data is shared on the same server which hosts nltk.org.
  3. Whenever the download is freezing the site is also not available ( not the nltk.org) but other sites for which caching is not enabled. Obviously server is not able to serve.
  4. May be there is a resource leak, which is manifesting for this download.
  5. There might be a process restart, which makes the server available after some time( ~2 mins).
  6. Why large downloads don't use torrent? Just another option for downloads.
like image 806
Biswanath Avatar asked Jan 13 '14 16:01

Biswanath


People also ask

Where can I download NLTK data?

Download individual packages from https://www.nltk.org/nltk_data/ (see the “download” links). Unzip them to the appropriate subfolder. For example, the Brown Corpus, found at: https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/packages/corpora/brown.zip is to be unzipped to nltk_data/corpora/brown .

How do I manually download Punkt?

Go to GitHub repo, download the package we need and unzip this file. For example, in this punkt case, we are going to download the zip file in this link. After we download it, we can then unzip it to get a folder named punkt.


2 Answers

EDIT: Here is a direct link that will allow you to request the data from the Frame Net project: https://framenet.icsi.berkeley.edu/fndrupal/framenet_request_data

When I downloaded the NLTK data I had to run the downloader several times since it kept hanging.

Alternatively here is a list of the individual files: http://nltk.org/nltk_data/

I just downloaded framenet_v15 from this link: http://nltk.github.com/nltk_data/packages/corpora/framenet_v15.zip

Also, see this question for more discussions on this: Installing natural language toolkit data

like image 73
e h Avatar answered Oct 01 '22 13:10

e h


I tried downloading by

import nltk

nltk.download('all')

And it worked for me

like image 30
Ayush Bairagi Avatar answered Oct 01 '22 13:10

Ayush Bairagi