Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

nlg building a sentence

Tags:

nlp

I would like to generate a sentence having as input words. E.g.

Input:

Mary
chase
the monkey

Output:

Mary chases the monkey.

This could be done using a simpleNLG library: http://code.google.com/p/simplenlg/ in the following way:

String subject = "Mary";
String verb = "chase";
String object = "the monkey";

p.setSubject(subject);
p.setVerb(verb);
p.setObject(object);

String output = realiser.realiseSentence(p);
System.out.println(output);

This will generate the sentence Mary chases the monkey. But I would like to make it automated where I input words and the sentence gets generated. This would require some preprocessing that would specify which word is a subject which word is a verb and which is an object. I know there are POS (parts of speech) tagging libraries but they don't specify whether it is a subject or object. Any suggestions how this could be done? Also for make it work for bigger sentences with multiple objects, adverbs etc.

like image 696
Radek Avatar asked Jun 02 '11 11:06

Radek


2 Answers

In order to obtain the subject, verb or object for the input sentence you need to perform syntactic analysis or parsing.

There are two main groups of parsing tools, constituent parsers and dependency parsers, but usually the former is the more direct path to obtain what you need.

These are some research constituent parsers that you may try:

  • Stanford parser
  • Berkeley parser
  • BUBS parser

This related question may also help: Simple Natural Language Processing Startup for Java

like image 24
zdepablo Avatar answered Sep 28 '22 17:09

zdepablo


Most common approach is to build ngramm statistics and then build most propable sequnce of words. Oen famous example can be found here http://scribe.googlelabs.com/

like image 107
yura Avatar answered Sep 28 '22 19:09

yura