I have been trying to find how to get the dependency tree with spaCy but I can't find anything on how to get the tree, only on how to navigate the tree.
Dependency Parsing Using spaCyIt defines the dependency relationship between headwords and their dependents. The head of a sentence has no dependency and is called the root of the sentence. The verb is usually the head of the sentence. All other words are linked to the headword.
dep_ property of each child token describes its relationship with its parent; for instance a dep_ of 'nsubj' means that a token is the nominal subject of its parent.
PRON : pronoun, e.g I, you, he, she, myself, themselves, somebody. PROPN : proper noun, e.g. Mary, John, London, NATO, HBO. PUNCT : punctuation, e.g. ., (, ), ? SCONJ : subordinating conjunction, e.g. if, while, that.
According to the spaCy official documentation, all spaCy models are trained using the Universal Dependency Corpora which is language-specific.
The tree isn't an object in itself; you just navigate it via the relationships between tokens. That's why the docs talk about navigating the tree, but not 'getting' it.
First, let's parse some text to get a Doc
object:
>>> import spacy >>> nlp = spacy.load('en_core_web_sm') >>> doc = nlp('First, I wrote some sentences. Then spaCy parsed them. Hooray!')
doc
is a Sequence
of Token
objects:
>>> doc[0] First >>> doc[1] , >>> doc[2] I >>> doc[3] wrote
But it doesn't have a single root token. We parsed a text made up of three sentences, so there are three distinct trees, each with their own root. If we want to start our parsing from the root of each sentence, it will help to get the sentences as distinct objects, first. Fortunately, doc
exposes these to us via the .sents
property:
>>> sentences = list(doc.sents) >>> for sentence in sentences: ... print(sentence) ... First, I wrote some sentences. Then spaCy parsed them. Hooray!
Each of these sentences is a Span
with a .root
property pointing to its root token. Usually, the root token will be the main verb of the sentence (although this may not be true for unusual sentence structures, such as sentences without a verb):
>>> for sentence in sentences: ... print(sentence.root) ... wrote parsed Hooray
With the root token found, we can navigate down the tree via the .children
property of each token. For instance, let's find the subject and object of the verb in the first sentence. The .dep_
property of each child token describes its relationship with its parent; for instance a dep_
of 'nsubj'
means that a token is the nominal subject of its parent.
>>> root_token = sentences[0].root >>> for child in root_token.children: ... if child.dep_ == 'nsubj': ... subj = child ... if child.dep_ == 'dobj': ... obj = child ... >>> subj I >>> obj sentences
We can likewise keep going down the tree by viewing one of these token's children:
>>> list(obj.children) [some]
Thus with the properties above, you can navigate the entire tree. If you want to visualise some dependency trees for example sentences to help you understand the structure, I recommend playing with displaCy.
In case someone wants to easily view the dependency tree produced by spacy, one solution would be to convert it to an nltk.tree.Tree
and use the nltk.tree.Tree.pretty_print
method. Here is an example:
import spacy from nltk import Tree en_nlp = spacy.load('en') doc = en_nlp("The quick brown fox jumps over the lazy dog.") def to_nltk_tree(node): if node.n_lefts + node.n_rights > 0: return Tree(node.orth_, [to_nltk_tree(child) for child in node.children]) else: return node.orth_ [to_nltk_tree(sent.root).pretty_print() for sent in doc.sents]
Output:
jumps ________________|____________ | | | | | over | | | | | | | | | | | dog | | | | | ___|____ The quick brown fox . the lazy
Edit: For changing the token representation you can do this:
def tok_format(tok): return "_".join([tok.orth_, tok.tag_]) def to_nltk_tree(node): if node.n_lefts + node.n_rights > 0: return Tree(tok_format(node), [to_nltk_tree(child) for child in node.children]) else: return tok_format(node)
Which results in:
jumps_VBZ __________________________|___________________ | | | | | over_IN | | | | | | | | | | | dog_NN | | | | | _______|_______ The_DT quick_JJ brown_JJ fox_NN ._. the_DT lazy_JJ
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With