Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to identify the subject of a sentence?

Tags:

python

nlp

nltk

Can Python + NLTK be used to identify the subject of a sentence? From what I have learned till now is that a sentence can be broken into a head and its dependents. For e.g. "I shot an elephant". In this sentence, I and elephant are dependents to shot. But How do I discern that the subject in this sentence is I.

like image 799
singhalc Avatar asked Feb 19 '15 22:02

singhalc


2 Answers

You can use Spacy.

Code

import spacy
nlp = spacy.load('en')
sent = "I shot an elephant"
doc=nlp(sent)

sub_toks = [tok for tok in doc if (tok.dep_ == "nsubj") ]

print(sub_toks) 
like image 77
Sohel Khan Avatar answered Sep 25 '22 23:09

Sohel Khan


As NLTK book (exercise 29) says, "One common way of defining the subject of a sentence S in English is as the noun phrase that is the child of S and the sibling of VP."

Look at tree example: indeed, "I" is the noun phrase that is the child of S that is the sibling of VP, while "elephant" is not.

like image 27
Nikita Astrakhantsev Avatar answered Sep 23 '22 23:09

Nikita Astrakhantsev