Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I manipulate parse trees?

I've been playing around with natural language parse trees and manipulating them in various ways. I've been using Stanford's Tregex and Tsurgeon tools but the code is a mess and doesn't fit in well with my mostly Python environment (those tools are Java and aren't ideal for tweaking). I'd like to have a toolset that would allow for easy hacking when I need more functionality. Are there any other tools that are well suited for doing pattern matching on trees and then manipulation of those matched branches?

For example, I'd like to take the following tree as input:

(ROOT
  (S
    (NP
      (NP (NNP Bank))
      (PP (IN of)
        (NP (NNP America))))
    (VP (VBD used)
      (S
        (VP (TO to)
          (VP (VB be)
            (VP (VBN called)
              (NP
                (NP (NNP Bank))
                (PP (IN of)
                  (NP (NNP Italy)))))))))))

and (this is a simplified example):

  1. Find any node with the label NP that has a first child with the label NP and some descendent named "Bank", and a second child with the label PP.
  2. If that matches, then take all of the children of the PP node and move them to end of the matched NP's children.

For example, take this part of the tree:

(NP
  (NP (NNP Bank))
  (PP (IN of)
    (NP (NNP America))))

and turn it into this:

(NP
  (NP (NNP Bank) (IN of) (NP (NNP America))))

Since my input trees are S-expressions I've considered using Lisp (embedded into my Python program) but it's been so long that I've written anything significant in Lisp that I have no idea where to even start.

What would be a good way to describe the patterns? What would be a good way to describe the manipulations? What's a good way to think about this problem?

like image 777
guidoism Avatar asked Sep 12 '10 01:09

guidoism


People also ask

Can a sentence have multiple parse trees?

Grammars in NLP basically correspond to Context-free Grammars(CFG) in formal Language theory. And, in case the CFG corresponding to the NLP task is ambiguous, then corresponding to a single sentence (more formally derivation), there can be multiple Parse Trees.


2 Answers

Beauty is in the eye of the beholder. But you never say how the code of Tregex or Tsurgeon is a mess. It sounds more like you can't deal with Java or greater abstraction and so you're looking for something concrete written in Python.

There's nothing wrong with hand-writing tree matching and transformation functions. Indeed, we used to do that all the time. But after the first couple of hundred, it seemed like there had to be a better way, and hence we moved to using the domain-specific languages of Tregex and Tsurgeon. This is generally seen as a laudable programming style. See in Wikipedia. They're well-specified languages with an exact syntax specification, etc. Here is your example using them.

Tree t = Tree.valueOf("(ROOT (S (NP (NP (NNP Bank)) (PP (IN of) (NP (NNP America)))) (VP (VBD used) (S (VP (TO to) (VP (VB be) (VP (VBN called) (NP (NP (NNP Bank)) (PP (IN of) (NP (NNP Italy)))))))))))");
TregexPattern pat = TregexPattern.compile("NP <1 (NP << Bank) <2 PP=remove");
TsurgeonPattern surgery = Tsurgeon.parseOperation("excise remove remove");
Tsurgeon.processPattern(pat, surgery, t).pennPrint();

Note that the Java code is actually shorter than the Lisp code, precisely because of use of the domain-specific language. It's hard to see how this could be simpler: specify pattern, specify operation, apply.

But if you'd prefer to be hand-writing methods that match patterns on trees and change them into other trees in Python, then you're most welcome to go off and do that.

like image 87
Christopher Manning Avatar answered Oct 24 '22 03:10

Christopher Manning


This is a typical case of using Lisp. You would need a function that maps another function over the tree.

Here is a procedural matching example using Common Lisp. There are matchers in Lisp that work over list structures, which could be used instead. Using a list matcher would simplify the example (see my other answer for an example using a pattern matcher).

The code:

(defun node-children (node)
  (rest node))

(defun node-name (node)
  (second node))

(defun node-type (node)
  (first node))


(defun treemap (tree matcher transformer)
  (cond ((null tree) nil)
        ((consp tree)
         (if (funcall matcher tree)
             (funcall transformer tree)
           (cons (node-type tree)
                 (mapcar (lambda (child)
                           (treemap child matcher transformer))
                         (node-children tree)))))
        (t tree))))

The example:

(defvar *tree*
  '(ROOT
    (S
     (NP
      (NP (NNP Bank))
      (PP (IN of)
          (NP (NNP America))))
     (VP (VBD used)
         (S
          (VP (TO to)
              (VP (VB be)
                  (VP (VBN called)
                      (NP
                       (NP (NNP Bank))
                       (PP (IN of)
                           (NP (NNP Italy))))))))))))



(defun example ()
  (pprint
   (treemap *tree*
            (lambda (node)
              (and (= (length (node-children node)) 2)
                   (eq (node-type (first (node-children node))) 'np)
                   (some (lambda (node)
                           (eq (node-name node) 'bank))
                         (children (first (node-children node))))
                   (eq (first (second (node-children node))) 'pp)))
            (lambda (node)
              (list (node-type node)
                    (append (first (node-children node))
                            (node-children (second (node-children node)))))))))

Running the example:

CL-USER 75 > (example)

(ROOT
 (S
  (NP
   (NP (NNP BANK) (IN OF) (NP (NNP AMERICA))))
  (VP
   (VBD USED)
   (S
    (VP
     (TO TO)
     (VP
      (VB BE)
      (VP
       (VBN CALLED)
       (NP
        (NP
         (NNP BANK)
         (IN OF)
         (NP (NNP ITALY)))))))))))
like image 21
Rainer Joswig Avatar answered Oct 24 '22 03:10

Rainer Joswig