I would like to generate a sentence having as input words. E.g.
Input:
Mary
chase
the monkey
Output:
Mary chases the monkey.
This could be done using a simpleNLG library: http://code.google.com/p/simplenlg/ in the following way:
String subject = "Mary";
String verb = "chase";
String object = "the monkey";
p.setSubject(subject);
p.setVerb(verb);
p.setObject(object);
String output = realiser.realiseSentence(p);
System.out.println(output);
This will generate the sentence Mary chases the monkey. But I would like to make it automated where I input words and the sentence gets generated. This would require some preprocessing that would specify which word is a subject which word is a verb and which is an object. I know there are POS (parts of speech) tagging libraries but they don't specify whether it is a subject or object. Any suggestions how this could be done? Also for make it work for bigger sentences with multiple objects, adverbs etc.
In order to obtain the subject, verb or object for the input sentence you need to perform syntactic analysis or parsing.
There are two main groups of parsing tools, constituent parsers and dependency parsers, but usually the former is the more direct path to obtain what you need.
These are some research constituent parsers that you may try:
This related question may also help: Simple Natural Language Processing Startup for Java
Most common approach is to build ngramm statistics and then build most propable sequnce of words. Oen famous example can be found here http://scribe.googlelabs.com/
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With