Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to install english.pickle for nltk on an off line Linux machine

I am trying to run nltk on a SUSE Linux box which cannot be connected to the internet.

I have successfully installed nltk and it runs but when I submit

>>> tagged = nltk.pos_tag(tokens)

I get this error:

LookupError:
**********************************************************************
Resource 'tokenizers/punkt/english.pickle' not found. Please use the NLTK Downloader to obtain the resource:

I cannot use the downloader since I can't connect the box to the internet.

Does anyone how I can install the necessary packages?

like image 935
Ross Farrelly Avatar asked Jul 19 '12 08:07

Ross Farrelly


2 Answers

Data is downloaded to the nltk_data directory. Where that is differs from one system to another, but you can find out by doing the following:

import nltk
print nltk.data.find('.')

english.pickle should be in a subfolder of <nltk_data>/taggers/. The easiest way to put it there is to use the downloader on a machine that has internet access, then copy it over and put it in the same subfolder. There's only one version of english.pickle, and you can download it on a Windows box, no problem.

like image 55
alexis Avatar answered Sep 20 '22 12:09

alexis


The downloader stores the files in a particular folder. I imagine it's possible to download on an online machine and copy the files to the equivalent location on your offline machine. On my machine, it downloads to /usr/local/lib/nltk_data.

like image 28
darkphoenix Avatar answered Sep 22 '22 12:09

darkphoenix