Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why Stanford parser with nltk is not correctly parsing a sentence?

I am using Stanford parser with nltk in python and got help from Stanford Parser and NLTK to set up Stanford nlp libraries.

from nltk.parse.stanford import StanfordParser
from nltk.parse.stanford import StanfordDependencyParser
parser     = StanfordParser(model_path="edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz")
dep_parser = StanfordDependencyParser(model_path="edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz")
one = ("John sees Bill")
parsed_Sentence = parser.raw_parse(one)
# GUI
for line in parsed_Sentence:
       print line
       line.draw()

parsed_Sentence = [parse.tree() for parse in dep_parser.raw_parse(one)]
print parsed_Sentence

# GUI
for line in parsed_Sentence:
        print line
        line.draw()

I am getting wrong parse and dependency trees as shown in the example below, it is treating 'sees' as noun instead of verb.

Example parse tree Example dependency tree

What should I do? It work perfectly right when I change sentence e.g.(one = 'John see Bill'). The correct ouput for this sentence can be viewed from here correct ouput of parse tree

Example of correct output is also shown below:

correctly parsed

correct dependency parsed tree

like image 412
Nomiluks Avatar asked Jan 23 '16 20:01

Nomiluks


People also ask

What is parsing NLTK?

NLTK Parsers. Classes and interfaces for producing tree structures that represent the internal organization of a text. This task is known as “parsing” the text, and the resulting tree structures are called the text's “parses”.

What does Stanford parser do?

The parser can read various forms of plain text input and can output various analysis formats, including part-of-speech tagged text, phrase structure trees, and a grammatical relations (typed dependency) format.

What is Stanford dependency parser?

A dependency parser analyzes the grammatical structure of a sentence, establishing relationships between "head" words and words which modify those heads.


1 Answers

Once again, no model is perfect (see Python NLTK pos_tag not returning the correct part-of-speech tag) ;P

You can try a "more accurate" parser, using the NeuralDependencyParser.

First setup the parser properly with the correct environment variables (see Stanford Parser and NLTK and https://gist.github.com/alvations/e1df0ba227e542955a8a), then:

>>> from nltk.internals import find_jars_within_path
>>> from nltk.parse.stanford import StanfordNeuralDependencyParser
>>> parser = StanfordNeuralDependencyParser(model_path="edu/stanford/nlp/models/parser/nndep/english_UD.gz")
>>> stanford_dir = parser._classpath[0].rpartition('/')[0]
>>> slf4j_jar = stanford_dir + '/slf4j-api.jar'
>>> parser._classpath = list(parser._classpath) + [slf4j_jar]
>>> parser.java_options = '-mx5000m'
>>> sent = "John sees Bill"
>>> [parse.tree() for parse in parser.raw_parse(sent)]
[Tree('sees', ['John', 'Bill'])]

Do note that the NeuralDependencyParser only produces the dependency trees:

enter image description here

like image 111
alvas Avatar answered Oct 10 '22 01:10

alvas