The usual coreference resolution works in the following way:
Provided
The man likes math. He really does.
it figures out that
he
refers to
the man.
There are plenty of tools to do this.
However, is there a way to do it backwards?
For example,
given
The man likes math. The man really does.
I want to do the pronoun resolution "backwards,"
so that I get an output like
The man likes math. He really does.
My input text will mostly be 3~10 sentences, and I'm working with python.
This is perhaps not really an answer to be happy with, but I think the answer is that there's no such functionality built in anywhere, though you can code it yourself without too much difficulty. Giving an outline of how I'd do it with CoreNLP:
Still run coref. This'll tell you that "the man" and "the man" are coreferent, and so you can replace the second one with a pronoun.
Run the gender
annotator from CoreNLP. This is a poorly-documented and even more poorly advertised annotator that tries to attach gender to tokens in a sentence.
Somehow figure out plurals. Most of the time you could use the part-of-speech tag: plural nouns get the tags NNS or NNPS, but there are some complications so you might also want to consider (1) the existence of conjunctions in the antecedent; (2) the lemma of a word being different from its text; (3) especially in conjunction with 2, the word ending in 's' or 'es' -- this can distinguish between lemmatizations which strip out plurals versus lemmatizations which strip out tenses, etc.
This is enough to figure out the right pronoun. Now it's just a matter of chopping up the sentence and putting it back together. This is a bit of a pain if you do it in CoreNLP -- the code is just not set up to change the text of a sentence -- but in the worst case you can always just re-annotate a new surface form.
Hope this helps somewhat!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With