Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Installing nltk data dependencies in setup.py script

I use NLTK with wordnet in my project. I did the installation manually on my PC, with pip: pip3 install nltk --user in a terminal, then nltk.download() in a python shell to download wordnet.

I want to automatize these with a setup.py file, but I don't know a good way to install wordnet.

For the moment, I have this piece of code after the call to setup ("nltk" is in the install_requires list of the call to setup):

import sys
if 'install' in sys.argv:
    import nltk
    nltk.download("wordnet")

Is there a better way to do this?

like image 450
Tom Cornebize Avatar asked Nov 07 '14 11:11

Tom Cornebize


People also ask

What is NLTK download (' WordNet ')?

WordNet is a lexical database for the English language, which was created by Princeton, and is part of the NLTK corpus. You can use WordNet alongside the NLTK module to find the meanings of words, synonyms, antonyms, and more.


1 Answers

I managed to install the NLTK data in setup.py by overriding cmdclass with my own Install class :

from setuptools import setup, find_packages
from setuptools.command.install import install as _install


class Install(_install):
    def run(self):
        _install.do_egg_install(self)
        import nltk
        nltk.download("popular")

setup(...
    cmdclass={'install': Install},
    ...
    install_requires=[
      'nltk',
      ],
    setup_requires=['nltk']
    ...
   )

It is important to use the method do_egg_install() in your run() method to make sure nltk gets installed, before import nltk is called (See also here python setuptools install_requires is ignored when overriding cmdclass). Also don't forget to add nltk to setup_requires.

like image 160
asmaier Avatar answered Oct 08 '22 20:10

asmaier