Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to parse a list of sentences?

Tags:

stanford-nlp

I want to parse a list of sentences with the Stanford NLP parser. My list is an ArrayList, how can I parse all the list with LexicalizedParser?

I want to get from each sentence this form:

Tree parse =  (Tree) lp1.apply(sentence);
like image 743
vitaly87 Avatar asked Feb 13 '11 16:02

vitaly87


2 Answers

Although one can dig into the documentation, I am going to provide code here on SO, especially since links move and/or die. This particular answer uses the whole pipeline. If not interested in the whole pipeline, I will provide an alternative answer in just a second.

The below example is the complete way of using the Stanford pipeline. If not interested in coreference resolution, remove dcoref from the 3rd line of code. So in the example below, the pipeline does the sentence splitting for you (the ssplit annotator) if you just feed it in a body of text (the text variable). Have just one sentence? Well, that is ok, you can feed that in as the text variable.

   // creates a StanfordCoreNLP object, with POS tagging, lemmatization, NER, parsing, and coreference resolution 
    Properties props = new Properties();
    props.put("annotators", "tokenize, ssplit, pos, lemma, ner, parse, dcoref");
    StanfordCoreNLP pipeline = new StanfordCoreNLP(props);

    // read some text in the text variable
    String text = ... // Add your text here!

    // create an empty Annotation just with the given text
    Annotation document = new Annotation(text);

    // run all Annotators on this text
    pipeline.annotate(document);

    // these are all the sentences in this document
    // a CoreMap is essentially a Map that uses class objects as keys and has values with custom types
    List<CoreMap> sentences = document.get(SentencesAnnotation.class);

    for(CoreMap sentence: sentences) {
      // traversing the words in the current sentence
      // a CoreLabel is a CoreMap with additional token-specific methods
      for (CoreLabel token: sentence.get(TokensAnnotation.class)) {
        // this is the text of the token
        String word = token.get(TextAnnotation.class);
        // this is the POS tag of the token
        String pos = token.get(PartOfSpeechAnnotation.class);
        // this is the NER label of the token
        String ne = token.get(NamedEntityTagAnnotation.class);       
      }

      // this is the parse tree of the current sentence
      Tree tree = sentence.get(TreeAnnotation.class);

      // this is the Stanford dependency graph of the current sentence
      SemanticGraph dependencies = sentence.get(CollapsedCCProcessedDependenciesAnnotation.class);
    }

    // This is the coreference link graph
    // Each chain stores a set of mentions that link to each other,
    // along with a method for getting the most representative mention
    // Both sentence and token offsets start at 1!
    Map<Integer, CorefChain> graph = 
      document.get(CorefChainAnnotation.class);
like image 153
demongolem Avatar answered Nov 09 '22 11:11

demongolem


Actually documentation from Stanford NLP provide sample of how to parse sentences.

You can find the documentation here

like image 31
Khairul Avatar answered Nov 09 '22 09:11

Khairul