I have been stuck trying to get the Stanford POS Tagger to work for a while. From an old SO post I found the following (slightly modified) code:
stanford_dir = 'C:/Users/.../stanford-postagger-2017-06-09/'
from nltk.tag import StanfordPOSTagger
#from nltk.tag.stanford import StanfordPOSTagger # I tried it both ways
from nltk import word_tokenize
# Add the jar and model via their path (instead of setting environment variables):
jar = stanford_dir + 'stanford-postagger.jar'
model = stanford_dir + 'models/english-left3words-distsim.tagger'
pos_tagger = StanfordPOSTagger(model, jar, encoding='utf8')
text = pos_tagger.tag(word_tokenize("What's the airspeed of an unladen swallow ?"))
print(text)
However, I get the following error:
LookupError:
===========================================================================
NLTK was unable to find the java file!
Use software specific configuration paramaters or set the JAVAHOME environment variable.
===========================================================================
I don't know what java file it is talking about. I'm sure it's finding the right files because if I change the path to something incorrect I get a different error:
LookupError: Could not find stanford-postagger.jar jar file at C:/Users/.../stanford-postagger-2017-06-09/sstanford-postagger.jar
What java file is missing? How do I get the Stanford POS tagger to work?
EDIT:
I went to this link for Stanford NLP on Windows and tried:
(Second EDIT - adding the installation procedures):
import urllib.request
import zipfile
urllib.request.urlretrieve(r'http://nlp.stanford.edu/software/stanford-postagger-full-2015-04-20.zip', r'C:/Users/HMISYS/Downloads/stanford-postagger-full-2015-04-20.zip')
zfile = zipfile.ZipFile(r'C:/Users/HMISYS/Downloads/stanford-postagger-full-2015-04-20.zip')
zfile.extractall(r'C:/Users/HMISYS/Downloads/')
# End second edit
from nltk.tag.stanford import StanfordPOSTagger
# Trying on an older version
_model_filename = r'C:/Users/HMISYS/Downloads/stanford-postagger-full-2015-04-20/models/english-bidirectional-distsim.tagger'
_path_to_jar = r'C:/Users/HMISYS/Downloads/stanford-postagger-full-2015-04-20/stanford-postagger.jar'
st = StanfordPOSTagger(model_filename=_model_filename, path_to_jar=_path_to_jar)
text = st.tag(nltk.word_tokenize("What's the airspeed of an unladen swallow ?"))
print(text)
but I got the same error. Based on this post I set my path variables with the following:
set STANFORDTOOLSDIR=$HOME
set CLASSPATH=$STANFORDTOOLSDIR/stanford-postagger-full-2015-04-20/stanford-postagger.jar
set export STANFORD_MODELS=$STANFORDTOOLSDIR/stanford-postagger-full-2015-04-20/models
But I get this error:
NLTK was unable to find stanford-postagger.jar! Set the CLASSPATH environment variable.
I added the following lines to my code and it worked:
import os
java_path = "C:/Program Files/Java/jdk1.8.0_161/bin/java.exe"
os.environ['JAVAHOME'] = java_path
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With