Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parsing multiple sentences with MaltParser using NLTK

There have been many MaltParser and/or NLTK related questions:

  • Malt Parser throwing class not found exception
  • How to use malt parser in python nltk
  • MaltParser Not Working in Python NLTK
  • NLTK MaltParser won't parse
  • Dependency parser using NLTK and MaltParser
  • Dependency Parsing using MaltParser and NLTK
  • Parsing with MaltParser engmalt
  • Parse raw text with MaltParser in Java

Now, there's a more stabilized version of MaltParser API in NLTK: https://github.com/nltk/nltk/pull/944 but there are issues when it comes to parsing multiple sentences at the same time.

Parsing one sentence at a time seems fine:

_path_to_maltparser = '/home/alvas/maltparser-1.8/dist/maltparser-1.8/'
_path_to_model= '/home/alvas/engmalt.linear-1.7.mco'     
>>> mp = MaltParser(path_to_maltparser=_path_to_maltparser, model=_path_to_model)
>>> sent = 'I shot an elephant in my pajamas'.split()
>>> sent2 = 'Time flies like banana'.split()
>>> print(mp.parse_one(sent).tree())
(pajamas (shot I) an elephant in my)

But parsing a list of sentences doesn't return a DependencyGraph object:

_path_to_maltparser = '/home/alvas/maltparser-1.8/dist/maltparser-1.8/'
_path_to_model= '/home/alvas/engmalt.linear-1.7.mco'     
>>> mp = MaltParser(path_to_maltparser=_path_to_maltparser, model=_path_to_model)
>>> sent = 'I shot an elephant in my pajamas'.split()
>>> sent2 = 'Time flies like banana'.split()
>>> print(mp.parse_one(sent).tree())
(pajamas (shot I) an elephant in my)
>>> print(next(mp.parse_sents([sent,sent2])))
<listiterator object at 0x7f0a2e4d3d90> 
>>> print(next(next(mp.parse_sents([sent,sent2]))))
[{u'address': 0,
  u'ctag': u'TOP',
  u'deps': [2],
  u'feats': None,
  u'lemma': None,
  u'rel': u'TOP',
  u'tag': u'TOP',
  u'word': None},
 {u'address': 1,
  u'ctag': u'NN',
  u'deps': [],
  u'feats': u'_',
  u'head': 2,
  u'lemma': u'_',
  u'rel': u'nn',
  u'tag': u'NN',
  u'word': u'I'},
 {u'address': 2,
  u'ctag': u'NN',
  u'deps': [1, 11],
  u'feats': u'_',
  u'head': 0,
  u'lemma': u'_',
  u'rel': u'null',
  u'tag': u'NN',
  u'word': u'shot'},
 {u'address': 3,
  u'ctag': u'AT',
  u'deps': [],
  u'feats': u'_',
  u'head': 11,
  u'lemma': u'_',
  u'rel': u'nn',
  u'tag': u'AT',
  u'word': u'an'},
 {u'address': 4,
  u'ctag': u'NN',
  u'deps': [],
  u'feats': u'_',
  u'head': 11,
  u'lemma': u'_',
  u'rel': u'nn',
  u'tag': u'NN',
  u'word': u'elephant'},
 {u'address': 5,
  u'ctag': u'NN',
  u'deps': [],
  u'feats': u'_',
  u'head': 11,
  u'lemma': u'_',
  u'rel': u'nn',
  u'tag': u'NN',
  u'word': u'in'},
 {u'address': 6,
  u'ctag': u'NN',
  u'deps': [],
  u'feats': u'_',
  u'head': 11,
  u'lemma': u'_',
  u'rel': u'nn',
  u'tag': u'NN',
  u'word': u'my'},
 {u'address': 7,
  u'ctag': u'NNS',
  u'deps': [],
  u'feats': u'_',
  u'head': 11,
  u'lemma': u'_',
  u'rel': u'nn',
  u'tag': u'NNS',
  u'word': u'pajamas'},
 {u'address': 8,
  u'ctag': u'NN',
  u'deps': [],
  u'feats': u'_',
  u'head': 11,
  u'lemma': u'_',
  u'rel': u'nn',
  u'tag': u'NN',
  u'word': u'Time'},
 {u'address': 9,
  u'ctag': u'NNS',
  u'deps': [],
  u'feats': u'_',
  u'head': 11,
  u'lemma': u'_',
  u'rel': u'nn',
  u'tag': u'NNS',
  u'word': u'flies'},
 {u'address': 10,
  u'ctag': u'NN',
  u'deps': [],
  u'feats': u'_',
  u'head': 11,
  u'lemma': u'_',
  u'rel': u'nn',
  u'tag': u'NN',
  u'word': u'like'},
 {u'address': 11,
  u'ctag': u'NN',
  u'deps': [3, 4, 5, 6, 7, 8, 9, 10],
  u'feats': u'_',
  u'head': 2,
  u'lemma': u'_',
  u'rel': u'dep',
  u'tag': u'NN',
  u'word': u'banana'}]

Why is that using parse_sents() don't return an iterable of parse_one?

I could however, just get lazy and do:

_path_to_maltparser = '/home/alvas/maltparser-1.8/dist/maltparser-1.8/'
_path_to_model= '/home/alvas/engmalt.linear-1.7.mco'     
>>> mp = MaltParser(path_to_maltparser=_path_to_maltparser, model=_path_to_model)
>>> sent1 = 'I shot an elephant in my pajamas'.split()
>>> sent2 = 'Time flies like banana'.split()
>>> sentences = [sent1, sent2]
>>> for sent in sentences:
>>> ...    print(mp.parse_one(sent).tree())

But this is not the solution I'm looking for. My question is how to answer why doesn't the parse_sent() return an iterable of parse_one(). and how could it be fixed in the NLTK code?


After @NikitaAstrakhantsev answered, I've tried it outputs a parse tree now but it seems to be confused and puts both sentences into one before parsing it.

# Initialize a MaltParser object with a pre-trained model.
mp = MaltParser(path_to_maltparser=path_to_maltparser, model=path_to_model) 
sent = 'I shot an elephant in my pajamas'.split()
sent2 = 'Time flies like banana'.split()
# Parse a single sentence.
print(mp.parse_one(sent).tree())
print(next(next(mp.parse_sents([sent,sent2]))).tree())

[out]:

(pajamas (shot I) an elephant in my)
(shot I (banana an elephant in my pajamas Time flies like))

From the code it seems to be doing something weird: https://github.com/nltk/nltk/blob/develop/nltk/parse/api.py#L45

Why is it that the parser abstract class in NLTK is swooshing two sentences into one before parsing? Am I calling the parse_sents() incorrectly? If so, what is the correct way to call parse_sents()?

like image 900
alvas Avatar asked May 26 '15 13:05

alvas


1 Answers

As I see in your code samples, you don't call tree() in this line

>>> print(next(next(mp.parse_sents([sent,sent2])))) 

while you do call tree() in all cases with parse_one().

Otherwise I don't see the reason why it could happen: parse_one() method of ParserI isn't overridden in MaltParser and everything it does is simply calling parse_sents() of MaltParser, see the code.

Upd: The line you're talking about isn't called, because parse_sents() is overridden in MaltParser and is directly called.

The only guess I have now is that java lib maltparser doesn't work correctly with input file containing several sentences (I mean this block - where java is run). Maybe original malt parser has changed the format and now it is not '\n\n'. Unfortunately, I can't run this code by myself, because maltparser.org is down for the second day. I checked that the input file has expected format (sentences are separated by double endline), so it is very unlikely that python wrapper merges sentences.

like image 83
Nikita Astrakhantsev Avatar answered Oct 17 '22 13:10

Nikita Astrakhantsev