Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using the Stanford NLP libraries from within R, using the rJava package

Does anybody have any experience with using StanfordCoreNLP ( http://nlp.stanford.edu/software/corenlp.shtml through rJava in R? I’ve been struggling to get it to work for two days now, and think I’ve exhausted Google and previous questions on StackOverflow.

Essentially I’m trying to use the StanfordNLP libraries from within R. I have zero Java experience, but experience with other languages, so understand the basics about classes and objects etc.

From what I can see, the demo .java file that comes with the libraries seems to show that to use the classes from within Java, you’d import the libraries and then create a new object, along the lines of:

import java.io.*;
import java.util.*;

import edu.stanford.nlp.io.*;
import edu.stanford.nlp.ling.*;
import edu.stanford.nlp.pipeline.*;
import edu.stanford.nlp.trees.*;
import edu.stanford.nlp.util.*;

    public class demo {

        etc.
        etc.

        StanfordCoreNLP pipeline = new StanfordCoreNLP();

        etc.

From within R, I’ve tried calling some standard java functions; this works fine, which makes me think it’s the way I’m trying to access the Stanford libraries that’s causing the issue.

I extracted the Stanford ZIP to h:\stanfordcore, so the .jar files are all in the root of this directory. As well as the various other files contained in the zip, it contains the main .jar files:

  • joda-time.jar
  • stanford-corenlp-1.3.4.jar
  • stanford-corenlp-1.3.4-javadoc.jar
  • stanford-corenlp-1.3.4-models.jar
  • joda-time-2.1-sources.jar
  • jollyday-0.4.7-sources.jar
  • stanford-corenlp-1.3.4-sources.jar
  • xom.jar
  • jollyday.jar

If I try to access the NLP tools from the command line, it works fine.

From within R, I initalized the JVM and set the classpath variable:

.jinit(classpath = " h:/stanfordcore", parameters = getOption("java.parameters"),silent = FALSE, force.init = TRUE)

After this, if I use the command

.jclassPath() 

This shows that the directory containing the required .jar files has been added and gives this output in R:

[1] "H:\RProject-2.15.1\library\rJava\java" "h:\ stanfordcore"

However, when I try create a new object (not sure if this is the right Java terminology) I get an error.

I’ve tried creating the object in dozens of different ways (basically shooting in the dark though), but the most promising (simply because it seems to actually find the class is):

pipeline <- .jnew(class="edu/stanford/nlp/pipeline/StanfordCoreNLP",check=TRUE,silent=FALSE)

I know this finds the class, because if I change the class parameter to something not listed in the API, I get a cannot find class error.

As it stands, however, I get the error:

Error in .jnew(class = "edu/stanford/nlp/pipeline/StanfordCoreNLP", check = TRUE, : java.lang.NoClassDefFoundError: Could not initialize class edu.stanford.nlp.pipeline.StanfordCoreNLP

My Googling indicates that this might be something to do with not finding a required .jar file, but I’m completely stuck. Am I missing something obvious?

If anyone can point me even a little in the right direction, I’d be incredibly grateful.

Thanks in advance!

Peter

like image 381
Peter Taylor Avatar asked Dec 18 '12 17:12

Peter Taylor


1 Answers

Your classpath is wrong - you are using a directory but you have JAR files. You have to either unpack all JAR files in the directory you specify (unusual) or you have to add all the JAR files to the class path (more common). [And you'll have to fix your typos, obviously, but I assume those come form the fact that you were not using copy/paste]

PS: please use stats-rosuda-devel mailing list if you want more timely answers.

like image 61
Simon Urbanek Avatar answered Nov 14 '22 12:11

Simon Urbanek