Given the following sentence:
The old oak tree from India fell down.
How can I get the following parse tree representation of the sentence using python NLTK?
(ROOT (S (NP (NP (DT The) (JJ old) (NN oak) (NN tree)) (PP (IN from) (NP (NNP India)))) (VP (VBD fell) (PRT (RP down)))))
I need a complete example which I couldn't find in web!
Edit
I have gone through this book chapter to learn about parsing using NLTK but the problem is, I need a grammar to parse sentences or phrases which I do not have. I have found this stackoverflow post which also asked about grammar for parsing but there is no convincing answer there.
So, I am looking for a complete answer that can give me the parse tree given a sentence.
fromlist() can be used to parse trees that are expressed as nested lists, such as those produced by the tree() function from the wordnet module.
NLTK Parsers. Classes and interfaces for producing tree structures that represent the internal organization of a text. This task is known as “parsing” the text, and the resulting tree structures are called the text's “parses”.
A Tree represents a hierarchical grouping of leaves and subtrees. For example, each constituent in a syntax tree is represented by a single Tree. A tree's children are encoded as a list of leaves and subtrees, where a leaf is a basic (non-tree) value; and a subtree is a nested Tree.
Parse tree is the hierarchical representation of terminals or non-terminals. These symbols (terminals or non-terminals) represent the derivation of the grammar to yield input strings. In parsing, the string springs using the beginning symbol.
Here is alternative solution using StanfordCoreNLP
instead of nltk
. There are few library that build on top of StanfordCoreNLP
, I personally use pycorenlp to parse the sentence.
First you have to download stanford-corenlp-full
folder where you have *.jar
file inside. And run the server inside the folder (default port is 9000).
export CLASSPATH="`find . -name '*.jar'`"
java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer [port?] # run server
Then in Python, you can run the following in order to tag the sentence.
from pycorenlp import StanfordCoreNLP
nlp = StanfordCoreNLP('http://localhost:9000')
text = "The old oak tree from India fell down."
output = nlp.annotate(text, properties={
'annotators': 'parse',
'outputFormat': 'json'
})
print(output['sentences'][0]['parse']) # tagged output sentence
Older question, but you can use nltk together with the bllipparser. Here is a longer example from nltk. After some fiddling I myself used the following:
To install (with nltk already installed):
sudo python3 -m nltk.downloader bllip_wsj_no_aux
pip3 install bllipparser
To use:
from nltk.data import find
from bllipparser import RerankingParser
model_dir = find('models/bllip_wsj_no_aux').path
parser = RerankingParser.from_unified_model_dir(model_dir)
best = parser.parse("The old oak tree from India fell down.")
print(best.get_reranker_best())
print(best.get_parser_best())
Output:
-80.435259246021 -23.831876011253 (S1 (S (NP (NP (DT The) (JJ old) (NN oak) (NN tree)) (PP (IN from) (NP (NNP India)))) (VP (VBD fell) (PRT (RP down))) (. .)))
-79.703612178593 -24.505514522222 (S1 (S (NP (NP (DT The) (JJ old) (NN oak) (NN tree)) (PP (IN from) (NP (NNP India)))) (VP (VBD fell) (ADVP (RB down))) (. .)))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With