Is there a way to decompose complex sentences into simple sentences in nltk or other natural language processing libraries?
For example:
The park is so wonderful when the sun is setting and a cool breeze is blowing ==> The sun is setting. a cool breeze is blowing. The park is so wonderful.
This is much more complicated than it seems, so you're unlikely to find a perfectly clean method.
However, using the English parser in OpenNLP, I can take your example sentence and get a following grammar tree:
(S
(NP (DT The) (NN park))
(VP
(VBZ is)
(ADJP (RB so) (JJ wonderful))
(SBAR
(WHADVP (WRB when))
(S
(S (NP (DT the) (NN sun)) (VP (VBZ is) (VP (VBG setting))))
(CC and)
(S
(NP (DT a) (JJ cool) (NN breeze))
(VP (VBZ is) (VP (VBG blowing)))))))
(. .)))
From there, you can pick it apart as you like. You can get your sub-clauses by extracting the top-level (NP *)(VP *) minus the (SBAR *) section. And then you could split the conjunction inside the (SBAR *) into the other two statements.
Note, the OpenNLP parser is trained using the Penn Treebank corpus. I obtained a pretty accurate parsing on your example sentence, but the parser isn't perfect and can be wildly wrong on other sentences. Look here for an explanation of its tags. It assumes you already have some basic understanding of linguistics and English grammar.
Edit: Btw, this is how I access OpenNLP from Python. This assumes you have the OpenNLP jar and model files in a opennlp-tools-1.4.3 folder.
import os, sys
from subprocess import Popen, PIPE
import nltk
BP = os.path.dirname(os.path.abspath(__file__))
CP = "%(BP)s/opennlp-tools-1.4.3.jar:%(BP)s/opennlp-tools-1.4.3/lib/maxent-2.5.2.jar:%(BP)s/opennlp-tools-1.4.3/lib/jwnl-1.3.3.jar:%(BP)s/opennlp-tools-1.4.3/lib/trove.jar" % dict(BP=BP)
cmd = "java -cp %(CP)s -Xmx1024m opennlp.tools.lang.english.TreebankParser -k 1 -d %(BP)s/opennlp.models/english/parser" % dict(CP=CP, BP=BP)
p = Popen(cmd, shell=True, stdin=PIPE, stdout=PIPE, stderr=PIPE, close_fds=True)
stdin, stdout, stderr = (p.stdin, p.stdout, p.stderr)
text = "This is my sample sentence."
stdin.write('%s\n' % text)
ret = stdout.readline()
ret = ret.split(' ')
prob = float(ret[1])
tree = nltk.Tree.parse(' '.join(ret[2:]))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With