Replace personal pronoun with previous person mentioned (noisy coref)

Question

I want to do a noisy resolution such that given a personal prounoun, that pronoun is replace by the previous(nearest) person.

For example:

Alex is looking at buying a U.K. startup for $1 billion. He is very confident that this is going to happen. Sussan is also in the same situation. However, she has lost hope.

the output is:

Alex is looking at buying a U.K. startup for $1 billion. Alex is very confident that this is going to happen. Sussan is also in the same situation. However, Susan has lost hope.

Another example,

Peter is a friend of Gates. But Gates does not like him.

In this case, the output would be :

Peter is a friend of Gates. But Gates does not like Gates.

Yes! This is super noisy.

Using spacy: I have extracted the Person using NER, but how can I replace pronouns appropriately?

Code:

import spacy
nlp = spacy.load("en_core_web_sm")
for ent in doc.ents:
  if ent.label_ == 'PERSON':
    print(ent.text, ent.label_)

Sergey Bushmanov · Accepted Answer

There is specially dedicated neuralcoref library to resolve coreference. See the minimal reproducible example below:

import spacy
import neuralcoref

nlp = spacy.load('en_core_web_sm')
neuralcoref.add_to_pipe(nlp)
doc = nlp(
'''Alex is looking at buying a U.K. startup for $1 billion. 
He is very confident that this is going to happen. 
Sussan is also in the same situation. 
However, she has lost hope.
Peter is a friend of Gates. But Gates does not like him.
          ''')

print(doc._.coref_resolved)

Alex is looking at buying a U.K. startup for $1 billion. 
Alex is very confident that this is going to happen. 
Sussan is also in the same situation. 
However, Sussan has lost hope.
Peter is a friend of Gates. But Gates does not like Peter.

Note, you may have some issues with neuralcoref if you pip install it, so it's better to build it from source, as I outlined it here

thorntonc · Answer

I have written a function that works for your two examples:

Consider using a larger model such as en_core_web_lg for more accurate tagging.

import spacy
from string import punctuation

nlp = spacy.load("en_core_web_lg")

def pronoun_coref(text):
    doc = nlp(text)
    pronouns = [(tok, tok.i) for tok in doc if (tok.tag_ == "PRP")]
    names = [(ent.text, ent[0].i) for ent in doc.ents if ent.label_ == 'PERSON']
    doc = [tok.text_with_ws for tok in doc]
    for p in pronouns:
        replace = max(filter(lambda x: x[1] < p[1], names),
                      key=lambda x: x[1], default=False)
        if replace:
            replace = replace[0]
            if doc[p[1] - 1] in punctuation:
                replace = ' ' + replace
            if doc[p[1] + 1] not in punctuation:
                replace = replace + ' '
            doc[p[1]] = replace
    doc = ''.join(doc)
    return doc

Replace personal pronoun with previous person mentioned (noisy coref)

Tags:

python

python-3.x

nlp

spacy

coreference-resolution

Jesujoba Oluwadara ALABI

2 Answers

Sergey Bushmanov

thorntonc

Recent Activity

Donate For Us

Replace personal pronoun with previous person mentioned (noisy coref)

Tags:

python

python-3.x

nlp

spacy

coreference-resolution

Jesujoba Oluwadara ALABI

2 Answers

Sergey Bushmanov

thorntonc

Related questions

Recent Activity

Donate For Us