Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

extracting the text from output parse Tree

I am new to nlp, I am trying to use stanford parser to extract the (NP ) sentence from a text, I want to retrieve the parts of the text where it's tagged (NP )

if a part is tagged (NP ) and a smaller part inside it is also tagged (NP ) I want to take the smaller part.

till now I managed to do what I wanted in the following method:

private static ArrayList<Tree> extract(Tree t) 
{
    ArrayList<Tree> wanted = new ArrayList<Tree>();
   if (t.label().value().equals("NP") )
    {
       wanted.add(t);
        for (Tree child : t.children())
        {
            ArrayList<Tree> temp = new ArrayList<Tree>();
            temp=extract(child);
            if(temp.size()>0)
            {
                int o=-1;
                o=wanted.indexOf(t);
                if(o!=-1)
                    wanted.remove(o);
            }
            wanted.addAll(temp);
        }
    }

    else
        for (Tree child : t.children())
            wanted.addAll(extract(child));
    return wanted;
}

The return type of this method is a list of trees, When I do the following:

     LexicalizedParser parser = LexicalizedParser.loadModel();
        x = parser.apply("Who owns club barcelona?");
     outs=extract(x);
    for(int i=0;i<outs.size();i++){System.out.println("tree #"+i+": "+outs.get(i));}

is :

tree #0: (NP (NN club) (NN barcelona))

I want the output to be "club barcelona" right away, without the tags, I tried the .labels(); property and .label().value(); they return the tags instead

like image 511
smohamed Avatar asked Sep 20 '12 14:09

smohamed


1 Answers

You can get a list of the words under a subtree tr with

tr.yield()

You can convert that to just the String form with convenience methods in Sentence:

Sentence.listToString(tr.yield())

You can just walk a tree as you're doing, but if you're going to do this kind of thing much, you might want to look at tregex which makes it easier to find particular nodes in trees via declarative patterns, such as NPs with no NP below them. A neat way to do what you are looking for is this:

Tree x = lp.apply("Christopher Manning owns club barcelona?");
TregexPattern NPpattern = TregexPattern.compile("@NP !<< @NP");
TregexMatcher matcher = NPpattern.matcher(x);
while (matcher.findNextMatchingNode()) {
  Tree match = matcher.getMatch();
  System.out.println(Sentence.listToString(match.yield()));
}
like image 113
Christopher Manning Avatar answered Oct 31 '22 23:10

Christopher Manning