Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

replacing pronoun with its antecedent using python2.7 and nltk

As the title shows I am trying to look for pronouns in a string and replace it with it's antecedent like:

[in]: "the princess looked from the palace, she was happy".
[out]: "the princess looked from the palace, the princess was happy". 

I use pos tag to return pronouns and nouns. I need to know how to replace without knowing the sentence, meaning how to specify the subject in the sentence to replace the pronoun with it. Any suggestions?

like image 548
Mony T Avatar asked Apr 07 '13 09:04

Mony T


1 Answers

I don't know the nltk package (never used it), but it seems to give your answer right away. If you look at the parse tree example on nltk.org, it shows that the subject is labeled successfully with an 'NP-SBJ' tag. Isn't this what you're looking for?

(Earlier, I overlooked the 'nltk' part in the title and I wrote the part below. I think it may be interesting as a general introduction on how to solve problems like this (especially if you don't have a package available), so I'll leave it here:)

This is more a 'natural language' (i.e. English language) question than a Python question. Could you be more specific in what kind of sentences you expect? Should it work for all possible English sentences? I think that would be really difficult.

If the sentences are 'easy' enough, it may be sufficient to assume that everything before the first verb is the subject. This works for your example, but doesn't work for the following sentences:

yesterday the princess looked from the palace, she was happy.
the princes who drank tea looked from the palace, she was happy.

(note that in the latter sentence the subject is "the princess who drank tea", the part 'who drank tea' is an 'adjective phrase').

Also, specify what should happen if the pronoun does not point to the subject (but to the object for example):

the princess looked at the prince, he was happy.

In order to solve your problem in the most general case, you should find (or make) a formal specification of the English (or any other) language, which could tell you exactly which part of the sentence is the subject, verb, object etc. Example: many simple English sentences are of the form (parts between brackets [] are optional, parts between parentheses () are choice, i.e., (the|a) means that you should choose either 'the' or 'a'):

sentence := subject verb [object]

Each part on the right side of the specification needs to be specified in more detail, e.g.:

subject, object := (noun_part_of_sentence|noun_part_of_sentence_plural)
noun_part_of_sentence := article [adjectivelist] [noun_modifier] noun # I guess there is a formal name for this...
noun_part_of_sentence_plural := [adjectivelist] [noun_modifier] noun_plural # note: no article
adjectivelist:= adjective [adjectivelist] # i.e., one or more adjectives

For more complex sentences, such as the one above with the adjective phrase, the above specification does not suffice, and should be something like:

noun_part_of_sentence := (the|a) [adjectivelist] [noun_modifier] [noun] [adjective_phrase]
adjective_phrase := relative_pronoun verb [object]
relative_pronoun := (who|which|that)

Note that the specification above is already quite powerful: (if you are able to identify correctly the type of each word, e.g. verb, noun, article etc.) it can successfully detect the following sentences:

The princess drank the tea.
The beautiful princess drank the tea.
The beautiful princess drank delicious the tea.
A beautiful princess drank delicious the lemon tea.
The beautiful princess who saw the handsome prince drank the refreshing tea.
The beautiful princess who saw the handsome prince who made the tea drank the refreshing tea.

However, it does not allow (yet) for sentences like 'the princess looked at the palace', 'the princess drank tea' (note: not 'the tea') and infinite others. The trick is to extend your formal specification to the level which is adequate for the type of sentences you expect.

After you have parsed your sentence successfully, you (thus) know what the subject, any pronouns and you can do the substitution. Note however that English language is not unambiguous, for example:

The princess looked at her mother, she was happy.

Is she pointing to the princess or to her mother?

Good luck!

P.S. English is not my native language, so I hope I have used the right terms for everything!

like image 146
Gijs van Oort Avatar answered Oct 17 '22 01:10

Gijs van Oort