Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I do dependency parsing in NLTK?

Going through the NLTK book, it's not clear how to generate a dependency tree from a given sentence.

The relevant section of the book: sub-chapter on dependency grammar gives an example figure but it doesn't show how to parse a sentence to come up with those relationships - or maybe I'm missing something fundamental in NLP?

EDIT: I want something similar to what the stanford parser does: Given a sentence "I shot an elephant in my sleep", it should return something like:

nsubj(shot-2, I-1)
det(elephant-4, an-3)
dobj(shot-2, elephant-4)
prep(shot-2, in-5)
poss(sleep-7, my-6)
pobj(in-5, sleep-7)
like image 864
MrD Avatar asked Sep 16 '11 10:09

MrD


People also ask

How does dependency parser work?

Dependency parsing is the process of analyzing the grammatical structure of a sentence based on the dependencies between the words in a sentence. In Dependency parsing, various tags represent the relationship between two words in a sentence. These tags are the dependency tags.

What is parsing in NLTK?

NLTK Parsers. Classes and interfaces for producing tree structures that represent the internal organization of a text. This task is known as “parsing” the text, and the resulting tree structures are called the text's “parses”.

Why do we need dependency parsing?

Dependency parsing helps us build a parsing tree with the tags used determining the relationship between words in the sentence rather than using any Grammar rule as used for syntactic parsing which gives a lot of flexibility even when the order of words (like 'boy handsome' or 'handsome boy') get changed.


3 Answers

We can use Stanford Parser from NLTK.

Requirements

You need to download two things from their website:

  1. The Stanford CoreNLP parser.
  2. Language model for your desired language (e.g. english language model)

Warning!

Make sure that your language model version matches your Stanford CoreNLP parser version!

The current CoreNLP version as of May 22, 2018 is 3.9.1.

After downloading the two files, extract the zip file anywhere you like.

Python Code

Next, load the model and use it through NLTK

from nltk.parse.stanford import StanfordDependencyParser

path_to_jar = 'path_to/stanford-parser-full-2014-08-27/stanford-parser.jar'
path_to_models_jar = 'path_to/stanford-parser-full-2014-08-27/stanford-parser-3.4.1-models.jar'

dependency_parser = StanfordDependencyParser(path_to_jar=path_to_jar, path_to_models_jar=path_to_models_jar)

result = dependency_parser.raw_parse('I shot an elephant in my sleep')
dep = result.next()

list(dep.triples())

Output

The output of the last line is:

[((u'shot', u'VBD'), u'nsubj', (u'I', u'PRP')),
 ((u'shot', u'VBD'), u'dobj', (u'elephant', u'NN')),
 ((u'elephant', u'NN'), u'det', (u'an', u'DT')),
 ((u'shot', u'VBD'), u'prep', (u'in', u'IN')),
 ((u'in', u'IN'), u'pobj', (u'sleep', u'NN')),
 ((u'sleep', u'NN'), u'poss', (u'my', u'PRP$'))]

I think this is what you want.

like image 107
ywat Avatar answered Oct 17 '22 16:10

ywat


I think you could use a corpus-based dependency parser instead of the grammar-based one NLTK provides.

Doing corpus-based dependency parsing on a even a small amount of text in Python is not ideal performance-wise. So in NLTK they do provide a wrapper to MaltParser, a corpus based dependency parser.

You might find this other question about RDF representation of sentences relevant.

like image 40
Neodawn Avatar answered Oct 17 '22 16:10

Neodawn


If you need better performance, then spacy (https://spacy.io/) is the best choice. Usage is very simple:

import spacy

nlp = spacy.load('en')
sents = nlp(u'A woman is walking through the door.')

You'll get a dependency tree as output, and you can dig out very easily every information you need. You can also define your own custom pipelines. See more on their website.

https://spacy.io/docs/usage/

like image 7
Aleksandar Jovanovic Avatar answered Oct 17 '22 17:10

Aleksandar Jovanovic