Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Use NLTK without installing [closed]

Tags:

python

nlp

nltk

Learning Python with the Natural Language Toolkit has been great fun, and they work fine on my local machine, though I had to install several packages in order to use it. Exactly how the NLTK resources are now integrated on my system remains a mystery to me, though it seems clear that the NLTK source code is not simply sitting someplace where the Python interpreter knows to find it.

I would like to use the Toolkit on my website, which is hosted by another company. Simply uploading the NLTK source code files to my server and telling scripts in the root directory to "import nltk" has not worked; I kind of doubted it would.

What, then, is the difference between whatever the NLTK install routine does and straightforward imports, and why should the toolkit be inaccessible to straightforward imports? Is there a way to use the NLTK source files without essentially altering my host's Python?

Many thanks for your thoughts and notes. -G

like image 969
Gavin Macready Avatar asked Aug 28 '12 20:08

Gavin Macready


3 Answers

Not only do you need NLTK on your PYTHONPATH (as @dhg points out), you need whatever dependencies it has; a quick local test indicates that this is really only PyYAML. You should just use pip to install packages. It's much less error-prone than trying to manually figure out all the dependencies and tweak the PYTHONPATH accordingly. If this is a shared host where you don't have the proper access to run a pip install, you should ask the host to do it for you.

To address the more general "Whatever the install script is doing" portion of your question: most Python packages are managed using setup.py, which is built on top of distutils (and sometimes setuputils). If this is something you're really interested in, check out The Hitchhiker’s Guide to Packaging.

like image 190
Hank Gay Avatar answered Nov 11 '22 16:11

Hank Gay


You don't need system-install support, just the right modules where python can find them. I've set up the NLTK without system install rights with relatively little trouble--but I did have commandline access so I could see what I was doing.

To get this working, you should put together a local install on a computer you do control-- ideally one that never had NLTK installed, since you may have forgotten (or not know) what was configured for you. Once you figure out what you need, copy the bundle to the hosting computer. But at that point, check that you're using the module versions that are appropriate for the webserver's architecture. Numpy in particular has different 32/64 bit versions, IIRC.

It's also worth your while to figure out how to see the error messages from the hosting computer. If you can't see them by default, you could catch ImportError and display the message it contains, or you could redirect stderr... it depends on your configuration.

like image 24
alexis Avatar answered Nov 11 '22 15:11

alexis


Let's assume that you have the NLTK source located in /some/dir/, so that

dhg /some/dir/$ ls nltk
...
app
book.py
ccg
chat
chunk
classify
...    

You can either launch the python interpreter from the directory in which the nltk source directory is found:

dhg /some/dir/$ python
Python 2.7.1 (r271:86882M, Nov 30 2010, 10:35:34) 
>>> import nltk

Or you can add its location to the PYTHONPATH environment variable, which makes NLTK available from anywhere:

dhg /whatever/$ export PYTHONPATH="$PYTHONPATH:/some/dir/"
dhg /whatever/$ python
Python 2.7.1 (r271:86882M, Nov 30 2010, 10:35:34) 
>>> import nltk

Any other dependencies, including those that NLTK depends on, can also be added to the PYTHONPATH in the same way.

like image 1
dhg Avatar answered Nov 11 '22 15:11

dhg