Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I test whether an nltk resource is already installed on the machine running my code?

Tags:

python

nlp

nltk

I just started my first NLTK project and am confused about the proper setup. I need several resources like the Punkt Tokenizer and the maxent pos tagger. I myself downloaded them using the GUI nltk.download(). For my collaborators I of course want that this things get downloaded automatically. I haven't found any idiomatic code for that in the docu.

Am I supposed to just put nltk.data.load('tokenizers/punkt/english.pickle') and their like into the code? Is this going to download the resources every time the script is run? Am I to provide feedback to the user (i.e. my co-developers) of what is being downloaded and why this is taking so long? There MUST be gear out there that does the job, right? :)

//Edit To explify my question:
How do I test whether an nltk resource (like the Punkt Tokenizer) is already installed on the machine running my code, and install it if it is not?

like image 452
Zakum Avatar asked May 16 '14 21:05

Zakum


People also ask

Where is NLTK installed?

Command line installation The recommended system location is C:\nltk_data (Windows); /usr/local/share/nltk_data (Mac); and /usr/share/nltk_data (Unix).


1 Answers

You can use the nltk.data.find() function, see https://github.com/nltk/nltk/blob/develop/nltk/data.py:

>>> import nltk >>> nltk.data.find('tokenizers/punkt.zip') ZipFilePathPointer(u'/home/alvas/nltk_data/tokenizers/punkt.zip', u'') 

When the resource is not available you'll find the error:

Traceback (most recent call last):   File "<stdin>", line 1, in <module>   File "/usr/local/lib/python2.7/dist-packages/nltk-3.0a3-py2.7.egg/nltk/data.py", line 615, in find     raise LookupError(resource_not_found) LookupError:  **********************************************************************   Resource u'punkt.zip' not found.  Please use the NLTK Downloader   to obtain the resource:  >>> nltk.download()   Searched in:     - '/home/alvas/nltk_data'     - '/usr/share/nltk_data'     - '/usr/local/share/nltk_data'     - '/usr/lib/nltk_data'     - '/usr/local/lib/nltk_data' ********************************************************************** 

Most probably, you would like to do something like this to ensure that your collaborators have the package:

>>> try: ...     nltk.data.find('tokenizers/punkt') ... except LookupError: ...     nltk.download('punkt') ...  [nltk_data] Downloading package punkt to /home/alvas/nltk_data... [nltk_data]   Package punkt is already up-to-date! True 
like image 85
alvas Avatar answered Sep 28 '22 04:09

alvas