Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Dependency parsing tree in Spacy

I have a sentence John saw a flashy hat at the store
How to represent this as a dependency tree as shown below?

(S
      (NP (NNP John))
      (VP
        (VBD saw)
        (NP (DT a) (JJ flashy) (NN hat))
        (PP (IN at) (NP (DT the) (NN store)))))

I got this script from here

import spacy
from nltk import Tree
en_nlp = spacy.load('en')

doc = en_nlp("John saw a flashy hat at the store")

def to_nltk_tree(node):
    if node.n_lefts + node.n_rights > 0:
        return Tree(node.orth_, [to_nltk_tree(child) for child in node.children])
    else:
        return node.orth_


[to_nltk_tree(sent.root).pretty_print() for sent in doc.sents]

I am getting the following but I am looking for a tree(NLTK) format.

     saw                 
  ____|_______________    
 |        |           at 
 |        |           |   
 |       hat        store
 |     ___|____       |   
John  a      flashy  the
like image 602
Niranjan Sonachalam Avatar asked Mar 16 '17 02:03

Niranjan Sonachalam


People also ask

How does spaCy do dependency parsing?

Dependency Parsing Using spaCyIt defines the dependency relationship between headwords and their dependents. The head of a sentence has no dependency and is called the root of the sentence. The verb is usually the head of the sentence. All other words are linked to the headword.

What does parser do in spaCy?

spaCy features a fast and accurate syntactic dependency parser, and has a rich API for navigating the tree. The parser also powers the sentence boundary detection, and lets you iterate over base noun phrases, or “chunks”. You can check whether a Doc object has been parsed by calling doc.

What is token DEP_?

dep_ property of each child token describes its relationship with its parent; for instance a dep_ of 'nsubj' means that a token is the nominal subject of its parent.

What is Propn in spaCy?

PRON : pronoun, e.g I, you, he, she, myself, themselves, somebody. PROPN : proper noun, e.g. Mary, John, London, NATO, HBO. PUNCT : punctuation, e.g. ., (, ), ? SCONJ : subordinating conjunction, e.g. if, while, that.


2 Answers

To re-create an NLTK-style tree for SpaCy dependency parses, try using the draw method from nltk.tree instead of pretty_print:

import spacy
from nltk.tree import Tree

spacy_nlp = spacy.load("en")

def nltk_spacy_tree(sent):
    """
    Visualize the SpaCy dependency tree with nltk.tree
    """
    doc = spacy_nlp(sent)
    def token_format(token):
        return "_".join([token.orth_, token.tag_, token.dep_])

    def to_nltk_tree(node):
        if node.n_lefts + node.n_rights > 0:
            return Tree(token_format(node),
                       [to_nltk_tree(child) 
                        for child in node.children]
                   )
        else:
            return token_format(node)

    tree = [to_nltk_tree(sent.root) for sent in doc.sents]
    # The first item in the list is the full tree
    tree[0].draw()

Note that because SpaCy only currently supports dependency parsing and tagging at the word and noun-phrase level, SpaCy trees won't be as deeply structured as the ones you'd get from, for instance, the Stanford parser, which you can also visualize as a tree:

from nltk.tree import Tree
from nltk.parse.stanford import StanfordParser

# Note: Download Stanford jar dependencies first
# See https://stackoverflow.com/questions/13883277/stanford-parser-and-nltk
stanford_parser = StanfordParser(
    model_path="edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz"
)

def nltk_stanford_tree(sent):
    """
    Visualize the Stanford dependency tree with nltk.tree
    """
    parse = stanford_parser.raw_parse(sent)
    tree = list(parse)
    # The first item in the list is the full tree
    tree[0].draw()

Now if we run both, nltk_spacy_tree("John saw a flashy hat at the store.") will produce this image and nltk_stanford_tree("John saw a flashy hat at the store.") will produce this one.

like image 98
rebeccabilbro Avatar answered Oct 18 '22 01:10

rebeccabilbro


Text representations aside, what you're trying to achieve is to get a constituency tree out of a dependency graph. Your example of desired output is a classic constituency tree (as in phrase structure grammar, as opposed to dependency grammar).

While the conversion from constituency trees into dependency graphs is more-or-less an automated task (for instance, http://www.mathcs.emory.edu/~choi/doc/clear-dependency-2012.pdf), the other direction is not. There have been works on that, check out the PAD project https://github.com/ikekonglp/PAD and the paper describing the underlying algorithm: http://homes.cs.washington.edu/~nasmith/papers/kong+rush+smith.naacl15.pdf.

You may also want to reconsider if you really need a constituency parse, here is a good argument: https://linguistics.stackexchange.com/questions/7280/why-is-constituency-needed-since-dependency-gets-the-job-done-more-easily-and-e

like image 43
adam.ra Avatar answered Oct 18 '22 01:10

adam.ra