I am new at Python and coming from Java background.
I've got a project, which uses nltk
and nltk_data
. I downloaded nltk_data
with nltk.download()
on my laptop and the project works fine but I would like to automate the downloading of nltk_data
.
I can download it from command line but I want to do it lazily as pip
downloads package upon pip install
. So my questions are:
nltk_data
as a regular Python package with pip
?nltk_data
lazily ?Go to GitHub repo, download the package we need and unzip this file. For example, in this punkt case, we are going to download the zip file in this link. After we download it, we can then unzip it to get a folder named punkt.
This tokenizer divides a text into a list of sentences by using an unsupervised algorithm to build a model for abbreviation words, collocations, and words that start sentences. It must be trained on a large collection of plaintext in the target language before it can be used.
Through Anaconda First, to install Anaconda, go to the link www.anaconda.com/distribution/#download-section and then select the version of Python you need to install. You need to review the output and enter 'yes'. NLTK will be downloaded and installed in your Anaconda package.
The bottom of the NLTK data documentation explains this:
Run the command
python -m nltk.downloader all
. To ensure central installation, run the commandsudo python -m nltk.downloader -d /usr/local/share/nltk_data all
.
If you want to distribute your program, you might want to consider writing a setuptools
setup.py
file to simplify installation:
What is setup.py?
Official packaging docs
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With