NOTE: I am using Python 2.7 as part of Anaconda distribution. I hope this is not a problem for nltk 3.1.
I am trying to use nltk for NER as
import nltk
from nltk.tag.stanford import StanfordNERTagger
#st = StanfordNERTagger('stanford-ner/all.3class.distsim.crf.ser.gz', 'stanford-ner/stanford-ner.jar')
st = StanfordNERTagger('english.all.3class.distsim.crf.ser.gz')
print st.tag(str)
but i get
Exception in thread "main" java.lang.NoClassDefFoundError: org/slf4j/LoggerFactory
at edu.stanford.nlp.io.IOUtils.<clinit>(IOUtils.java:41)
at edu.stanford.nlp.ie.AbstractSequenceClassifier.classifyAndWriteAnswers(AbstractSequenceClassifier.java:1117)
at edu.stanford.nlp.ie.AbstractSequenceClassifier.classifyAndWriteAnswers(AbstractSequenceClassifier.java:1076)
at edu.stanford.nlp.ie.AbstractSequenceClassifier.classifyAndWriteAnswers(AbstractSequenceClassifier.java:1057)
at edu.stanford.nlp.ie.crf.CRFClassifier.main(CRFClassifier.java:3088)
Caused by: java.lang.ClassNotFoundException: org.slf4j.LoggerFactory
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 5 more
Traceback (most recent call last):
File "X:\jnk.py", line 47, in <module>
print st.tag(str)
File "X:\Anaconda2\lib\site-packages\nltk\tag\stanford.py", line 66, in tag
return sum(self.tag_sents([tokens]), [])
File "X:\Anaconda2\lib\site-packages\nltk\tag\stanford.py", line 89, in tag_sents
stdout=PIPE, stderr=PIPE)
File "X:\Anaconda2\lib\site-packages\nltk\internals.py", line 134, in java
raise OSError('Java command failed : ' + str(cmd))
OSError: Java command failed : ['X:\\PROGRA~1\\Java\\JDK18~1.0_6\\bin\\java.exe', '-mx1000m', '-cp', 'X:\\stanford\\stanford-ner.jar', 'edu.stanford.nlp.ie.crf.CRFClassifier', '-loadClassifier', 'X:\\stanford\\classifiers\\english.all.3class.distsim.crf.ser.gz', '-textFile', 'x:\\appdata\\local\\temp\\tmpqjsoma', '-outputFormat', 'slashTags', '-tokenizerFactory', 'edu.stanford.nlp.process.WhitespaceTokenizer', '-tokenizerOptions', '"tokenizeNLs=false"', '-encoding', 'utf8']
but i can see that the slf4j jar is there in my lib folder. do i need to update an environment variable?
Edit
Thanks everyone for their help, but i still get the same error. Here is what i tried recently
import nltk
from nltk.tag import StanfordNERTagger
print(nltk.__version__)
stanford_ner_dir = 'X:\\stanford\\'
eng_model_filename= stanford_ner_dir + 'classifiers\\english.all.3class.distsim.crf.ser.gz'
my_path_to_jar= stanford_ner_dir + 'stanford-ner.jar'
st = StanfordNERTagger(model_filename=eng_model_filename, path_to_jar=my_path_to_jar)
print st._stanford_model
print st._stanford_jar
st.tag('Rami Eid is studying at Stony Brook University in NY'.split())
and also
import nltk
from nltk.tag import StanfordNERTagger
print(nltk.__version__)
st = StanfordNERTagger('english.all.3class.distsim.crf.ser.gz')
print st._stanford_model
print st._stanford_jar
st.tag('Rami Eid is studying at Stony Brook University in NY'.split())
i get
3.1
X:\stanford\classifiers\english.all.3class.distsim.crf.ser.gz
X:\stanford\stanford-ner.jar
after that it goes on to print the same stacktrace as before. java.lang.ClassNotFoundException: org.slf4j.LoggerFactory
any idea why this might be happening? I updated my CLASSPATH as well. I even added all the relevant folders to my PATH environment variable.for example the folder where i unzipped the stanford jars, the place where i unzipped slf4j and even the lib folder inside the stanford folder. i have no idea why this is happening :(
Could it be windows? i have had problems with windows paths before
Update
The Stanford NER version i have is 3.6.0. The zip file says stanford-ner-2015-12-09.zip
I also tried using the stanford-ner-3.6.0.jar
instead of stanford-ner.jar
but still get the same error
When i right click on the stanford-ner-3.6.0.jar
, i notice
i see this for all the files that i have extracted, even the slf4j files.could this be causing the problem?
java.lang.NoClassDefFoundError: org/slf4j/LoggerFactory
i do not see any folder named org
anywhere
Update: Env variables
Here are my env variables
CLASSPATH
.;
X:\jre1.8.0_60\lib\rt.jar;
X:\stanford\stanford-ner-3.6.0.jar;
X:\stanford\stanford-ner.jar;
X:\stanford\lib\slf4j-simple.jar;
X:\stanford\lib\slf4j-api.jar;
X:\slf4j\slf4j-1.7.13\slf4j-1.7.13\slf4j-log4j12-1.7.13.jar
STANFORD_MODELS
X:\stanford\classifiers
JAVA_HOME
X:\PROGRA~1\Java\JDK18~1.0_6
PATH
X:\PROGRA~1\Java\JDK18~1.0_6\bin;
X:\stanford;
X:\stanford\lib;
X:\slf4j\slf4j-1.7.13\slf4j-1.7.13
anything wrong here?
NOTE:
Below is a temporal hack to work with:
This solution is NOT meant to be an eternal solution.
Always refer to https://github.com/nltk/nltk/wiki/Installing-Third-Party-Software for the latest instruction on how to interface Stanford NLP tools using NLTK!!
Please track updates on this issue if you do not want to use this "hack": https://github.com/nltk/nltk/issues/1237 or please use the NER tool compield on 2015-04-20.
Make sure that you have:
CLASSPATH
and STANFORD_MODELS
To set environment variables in Windows:
set CLASSPATH=%CLASSPATH%;C:\some\path\to\stanford-ner\stanford-ner.jar
set STANFORD_MODELS=%STANFORD_MODELS%;C:\some\path\to\stanford-ner\classifiers
To set environment variables in Linux:
export STANFORDTOOLSDIR=/home/some/path/to/stanfordtools/
export CLASSPATH=$STANFORDTOOLSDIR/stanford-ner-2015-12-09/stanford-ner.jar
export STANFORD_MODELS=$STANFORDTOOLSDIR/stanford-ner-2015-12-09/classifiers
Then:
>>> from nltk.internals import find_jars_within_path
>>> from nltk.tag import StanfordNERTagger
>>> st = StanfordNERTagger('english.all.3class.distsim.crf.ser.gz')
# Note this is where your stanford_jar is saved.
# We are accessing the environment variables you've
# set through the NLTK API.
>>> print st._stanford_jar
/home/alvas/stanford-ner-2015-12-09/stanford-ner.jar
>>> stanford_dir = st._stanford_jar.rpartition("\\")[0] # windows
# Note in linux you do this instead:
>>> stanford_dir = st._stanford_jar.rpartition('/')[0] # linux
# Use the `find_jars_within_path` function to get all the
# jar files out from stanford NER tool under the libs/ dir.
>>> stanford_jars = find_jars_within_path(stanford_dir)
# Put the jars back into the `stanford_jar` classpath.
>>> st._stanford_jar = ':'.join(stanford_jars) # linux
>>> st._stanford_jar = ';'.join(stanford_jars) # windows
>>> st.tag('Rami Eid is studying at Stony Brook University in NY'.split())
[(u'Rami', u'PERSON'), (u'Eid', u'PERSON'), (u'is', u'O'), (u'studying', u'O'), (u'at', u'O'), (u'Stony', u'ORGANIZATION'), (u'Brook', u'ORGANIZATION'), (u'University', u'ORGANIZATION'), (u'in', u'O'), (u'NY', u'O')]
I encountered exactly the same problem as you described yesterday.
There are 3 things you need to do.
1) Update your NLTK.
pip install -U nltk
Your version should be >3.1 and I see you are using
from nltk.tag.stanford import StanfordNERTagger
However, you gotta use the new module:
from nltk.tag import StanfordNERTagger
2) Download slf4j and update your CLASSPATH.
Here is how you update your CLASSPATH.
javapath = "/Users/aerin/Downloads/stanford-ner-2014-06-16/stanford-ner.jar:/Users/aerin/java/slf4j-1.7.13/slf4j-log4j12-1.7.13.jar"
os.environ['CLASSPATH'] = javapath
As you see above, the javapath contains 2 paths, one is where stanford-ner.jar is, the other is where you downloaded slf4j-log4j12-1.7.13.jar (It can be downloaded here: http://www.slf4j.org/download.html)
3) Don't forget to specify where you downloaded 'english.all.3class.distsim.crf.ser.gz' & 'stanford-ner.jar'
st = StanfordNERTagger('/Users/aerin/Downloads/stanford-ner-2014-06-16/classifiers/english.all.3class.distsim.crf.ser.gz','/Users/aerin/Downloads/stanford-ner-2014-06-16/stanford-ner.jar')
st.tag("Doneyo lab did such an awesome job!".split())
i fixed!
u should indicate the full path of slf4j-api.jar in CLASSPATH
instead of add jar-path into system environment variable, u can do like this in code:
_CLASS_PATH = "."
if os.environ.get('CLASSPATH') is not None:
_CLASS_PATH = os.environ.get('CLASSPATH')
os.environ['CLASSPATH'] = _CLASS_PATH + ';F:\Python\Lib\slf4j\slf4j-api-1.7.13.jar'
important, in nltk/*/stanford.py will reset the classpath like this:
stdout, stderr = java(cmd, classpath=self._stanford_jar, stdout=PIPE, stderr=PIPE)
eg. \Python34\Lib\site-packages\nltk\tokenize\stanford.py line:90
u can fix it like this:
_CLASS_PATH = "."
if os.environ.get('CLASSPATH') is not None:
_CLASS_PATH = os.environ.get('CLASSPATH')
stdout, stderr = java(cmd, classpath=(self._stanford_jar, _CLASS_PATH), stdout=PIPE, stderr=PIPE)
Current Stanford NER tagger version is not compatible with nltk
because it requires additional jars that nltk
cannot add to the CLASSPATH
.
Instead prefer an older version of Stanford NER Tagger that will works perfectly fine like this one: http://nlp.stanford.edu/software/stanford-ner-2015-04-20.zip
For those who want to use Stanford NER >= 3.6.0 instead of the 2015-01-30 (3.5.1) or other old version, do this instead:
Put the stanford-ner.jar and slf4j-api.jar into the same folder
For example, I put the following files to /path-to-libs/
Then:
classpath = "/path-to-libs/*"
st = nltk.tag.StanfordNERTagger(
"/path-to-model/ner-model.ser.gz",
"/path-to-libs/stanford-ner-3.6.0.jar"
)
st._stanford_jar = classpath
result = st.tag(["Hello"])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With