Dependency parsing tree in Spacy

Tags:

I have a sentence John saw a flashy hat at the store
How to represent this as a dependency tree as shown below?

(S
      (NP (NNP John))
      (VP
        (VBD saw)
        (NP (DT a) (JJ flashy) (NN hat))
        (PP (IN at) (NP (DT the) (NN store)))))

I got this script from here

import spacy
from nltk import Tree
en_nlp = spacy.load('en')

doc = en_nlp("John saw a flashy hat at the store")

def to_nltk_tree(node):
    if node.n_lefts + node.n_rights > 0:
        return Tree(node.orth_, [to_nltk_tree(child) for child in node.children])
    else:
        return node.orth_


[to_nltk_tree(sent.root).pretty_print() for sent in doc.sents]

I am getting the following but I am looking for a tree(NLTK) format.

     saw                 
  ____|_______________    
 |        |           at 
 |        |           |   
 |       hat        store
 |     ___|____       |   
John  a      flashy  the

602

asked Mar 16 '17 02:03

Niranjan Sonachalam

2 Answers

To re-create an NLTK-style tree for SpaCy dependency parses, try using the draw method from nltk.tree instead of pretty_print:

import spacy
from nltk.tree import Tree

spacy_nlp = spacy.load("en")

def nltk_spacy_tree(sent):
    """
    Visualize the SpaCy dependency tree with nltk.tree
    """
    doc = spacy_nlp(sent)
    def token_format(token):
        return "_".join([token.orth_, token.tag_, token.dep_])

    def to_nltk_tree(node):
        if node.n_lefts + node.n_rights > 0:
            return Tree(token_format(node),
                       [to_nltk_tree(child) 
                        for child in node.children]
                   )
        else:
            return token_format(node)

    tree = [to_nltk_tree(sent.root) for sent in doc.sents]
    # The first item in the list is the full tree
    tree[0].draw()

Note that because SpaCy only currently supports dependency parsing and tagging at the word and noun-phrase level, SpaCy trees won't be as deeply structured as the ones you'd get from, for instance, the Stanford parser, which you can also visualize as a tree:

from nltk.tree import Tree
from nltk.parse.stanford import StanfordParser

# Note: Download Stanford jar dependencies first
# See https://stackoverflow.com/questions/13883277/stanford-parser-and-nltk
stanford_parser = StanfordParser(
    model_path="edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz"
)

def nltk_stanford_tree(sent):
    """
    Visualize the Stanford dependency tree with nltk.tree
    """
    parse = stanford_parser.raw_parse(sent)
    tree = list(parse)
    # The first item in the list is the full tree
    tree[0].draw()

Now if we run both, nltk_spacy_tree("John saw a flashy hat at the store.") will produce this image and nltk_stanford_tree("John saw a flashy hat at the store.") will produce this one.

answered Oct 18 '22 01:10

rebeccabilbro

Text representations aside, what you're trying to achieve is to get a constituency tree out of a dependency graph. Your example of desired output is a classic constituency tree (as in phrase structure grammar, as opposed to dependency grammar).

While the conversion from constituency trees into dependency graphs is more-or-less an automated task (for instance, http://www.mathcs.emory.edu/~choi/doc/clear-dependency-2012.pdf), the other direction is not. There have been works on that, check out the PAD project https://github.com/ikekonglp/PAD and the paper describing the underlying algorithm: http://homes.cs.washington.edu/~nasmith/papers/kong+rush+smith.naacl15.pdf.

You may also want to reconsider if you really need a constituency parse, here is a good argument: https://linguistics.stackexchange.com/questions/7280/why-is-constituency-needed-since-dependency-gets-the-job-done-more-easily-and-e

answered Oct 18 '22 01:10

adam.ra

Related questions
                            
                                General synonym and part of speech processing using nltk
                            
                                Increase performance of Stanford-tagger based program
                            
                                Phrase extraction algorithm for statistical machine translation
                            
                                Recognize partial/complete address with NLP framework
                            
                                Are there examples of using reinforcement learning for text classification?
                            
                                What is the difference between mteval-v13a.pl and NLTK BLEU?
                            
                                How to fill in the blank using bidirectional RNN and pytorch?
                            
                                Verbally format a number in Python
                            
                                Unstructured Text to Structured Data
                            
                                How to parse a list of words according to a simplified grammar?
                            
                                Basic NLP in CoffeeScript or JavaScript -- Punkt tokenizaton, simple trained Bayes models -- where to start? [closed]
                            
                                Italian stemming library in java
                            
                                Is there any best practice to prepare features for text-based classification?
                            
                                Rewriting sentences while retaining semantic meaning
                            
                                How to access topic words only in gensim
                            
                                How to tie word embedding and softmax weights in keras?
                            
                                Is it necessary to do stopwords removal ,Stemming/Lemmatization for text classification while using Spacy,Bert?
                            
                                Why getting different results with MALLET topic inference for single and batch of documents?
                            
                                how to find similar sentences / phrases in R?
                            
                                Visualize Parse Tree Structure

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Dependency parsing tree in Spacy

Tags:

nlp

spacy

dependency-parsing

Niranjan Sonachalam

People also ask

2 Answers

rebeccabilbro

adam.ra

Recent Activity

Donate For Us