Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a way to get the subject of a sentence using OpenNLP?

Tags:

java

nlp

opennlp

Is there a way to get the subject of a sentence using OpenNLP? I'm trying to identify the most important part of a users sentence. Generally, users will be submitting sentences to our "engine" and we want to know exactly what the core topic is of that sentence.

Currently we are using openNlp to:

  1. Chunk the sentence
  2. Identify the noun-phrase, verbs, etc of the sentence
  3. Identify all "topics" of the sentence
  4. (NOT YET DONE!) Identify the "core topic" of the sentence

Please let me know if you have any bright ideas..

like image 335
rockit Avatar asked Apr 05 '11 18:04

rockit


2 Answers

Dependency Parser

If you're interested in extracting grammatical relations such as what word or phrase is the subject of a sentence, you should really use a dependency parser. While OpenNLP does support phrase structure parsing, I don't think it does dependency parsing yet.

Opensource Software

Packages written in Java that support dependency parsing include:

  • MaltParser
  • MSTParser
  • Stanford Parser (demo, see typed dependencies section)
  • RelEx

Of these, the Stanford Parser is the most accurate. However, some configurations of the MaltParser can be insanely fast (Cer et al. 2010).

like image 159
dmcer Avatar answered Oct 20 '22 11:10

dmcer


For the grammatical subject you'd need to rely on configurational information in the tree. If the parse looks something like (TOP (S (NP ----) (VP ----))) then you can take the NP as the subject; often, though not at all always, that will be the case. However only some sentences will have this configuration; one can easily imagine structures with subjects that are not in that position -- passive constructions, for example.

You're probably better off using MaltParser though.

like image 43
John Stewart Avatar answered Oct 20 '22 11:10

John Stewart