Using NLTK's StanfordParser, I can parse a sentence like this:
os.environ['STANFORD_PARSER'] = 'C:\jars'
os.environ['STANFORD_MODELS'] = 'C:\jars'
os.environ['JAVAHOME'] ='C:\ProgramData\Oracle\Java\javapath'
parser = stanford.StanfordParser(model_path="C:\jars\englishPCFG.ser.gz")
sentences = parser.parse(("bring me a red ball",))
for sentence in sentences:
sentence
The result is:
Tree('ROOT', [Tree('S', [Tree('VP', [Tree('VB', ['Bring']),
Tree('NP', [Tree('DT', ['a']), Tree('NN', ['red'])]), Tree('NP',
[Tree('NN', ['ball'])])]), Tree('.', ['.'])])])
How can I use the Stanford parser to get typed dependencies in addition to the above graph? Something like:
In Dependency parsing, various tags represent the relationship between two words in a sentence. These tags are the dependency tags. For example, In the phrase 'rainy weather,' the word rainy modifies the meaning of the noun weather.
A dependency parser analyzes the grammatical structure of a sentence, establishing relationships between "head" words and words which modify those heads.
More formally, a dependency parse tree is a graph where the set of vertices contains the words in the sentence, and each edge in. connects two words. The graph must satisfy three conditions: There has to be a single root node with no incoming edges.
The parser can read various forms of plain text input and can output various analysis formats, including part-of-speech tagged text, phrase structure trees, and a grammatical relations (typed dependency) format.
NLTK's StanfordParser module doesn't (currently) wrap the tree to Stanford Dependencies conversion code. You can use my library PyStanfordDependencies, which wraps the dependency converter.
If nltk_tree
is sentence
from the question's code snippet, then this works:
#!/usr/bin/python3
import StanfordDependencies
# Use str() to convert the NLTK tree to Penn Treebank format
penn_treebank_tree = str(nltk_tree)
sd = StanfordDependencies.get_instance(jar_filename='point to Stanford Parser JAR file')
converted_tree = sd.convert_tree(penn_treebank_tree)
# Print Typed Dependencies
for node in converted_tree:
print('{}({}-{},{}-{})'.format(
node.deprel,
converted_tree[node.head - 1].form if node.head != 0 else 'ROOT',
node.head,
node.form,
node.index))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With