Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there any Phrase head finder?

Tags:

java

nlp

I have some sentences that I want to parse. Here is what I have and what I need: I have sentences like these:

I was in the hospital.

I was going from home to Canada.

What I want is to know the head of "in the hospital", "from home", and "to Canada" phrases.

I am using Berkley parser, but what it gives me is the parsing result of all the sentence, and if I want to extract the head of phrases manually, I should develop another parser! The file that I want to parse is a very big file, so if I develop a parser myself, it may have many errors. Is there any parser that can give me the result I am looking for?

By the way, as parsing the phrases separately, may result in a different parsing compared with sentence parsing, I insist on parsing the sentences and then extract the phrase heads.

like image 586
user1419243 Avatar asked Jan 16 '23 13:01

user1419243


2 Answers

The Stanford Parser and the OpenNLP parsers will give you Part-of-Speech and dependency information, which you can use to determine the head of phrases

For example, using the Stanford parser, you would get:

(S
  (NP (PRP I))
  (VP (VBD was)
      (PP (IN in)
          (NP (DT the)
              (NN hospital)))))

Which tells you that the sentence (S) consists of a noun-phrase (NP) and a verb-phrase (VP); the verb-phrase is a verb (V*) + prepositional phrase (PP), which is the preposition in and a noun-phrase; the second noun-phrase is a determiner (DT) and a noun (NN).

If I understand the question properly, you are looking for the heads of noun-phrases (and possibly the verb-phrases). You can identify the head from this information already, but the parser gives you the following dependency information as well:

nsubj(was, I)
prep_in(was, hospital)
det(hospital, the)

This tells you that the words was and I are in an nominal-subject nsubj relationship (I is the subject of the verb was); the words was and hospital are in an "in" preposition (prep-in) relationship; the words "hospital" and "the" are in a determiner (det) relationship. Using the previous parsing and the dependency information, you can tell that the head of the first noun-phrase is "I" (trivial), and the head of the second noun-phrase is "hospital" (as that is the "top" element of the relations within the noun-phrase)

like image 101
Attila Avatar answered Jan 20 '23 16:01

Attila


The question of finding head word in a phrase is not trivial as outlined in the response by Attila. Prof. Michael Collins has a list of heuristics for finding head word (his heuristics are based on Penn Tree bank dataset) and the implementation of these heuristics are available in the Stanford CoreNLP Suite (I checked in the 20140104 version).

The response given here has more details about the classes in Stanford CoreNLP that does the head word finding for you.

like image 24
TheGT Avatar answered Jan 20 '23 14:01

TheGT