I am trying to figure out the way to rewrite sentences by "resolving" (replacing words with) their coreferences using Stanford Corenlp's Coreference module.
The idea is to rewrite a sentence like the following :
John drove to Judy’s house. He made her dinner.
into
John drove to Judy’s house. John made Judy dinner.
Here's the code I've been fooling around with :
private void doTest(String text){
Annotation doc = new Annotation(text);
pipeline.annotate(doc);
Map<Integer, CorefChain> corefs = doc.get(CorefChainAnnotation.class);
List<CoreMap> sentences = doc.get(CoreAnnotations.SentencesAnnotation.class);
List<String> resolved = new ArrayList<String>();
for (CoreMap sentence : sentences) {
List<CoreLabel> tokens = sentence.get(CoreAnnotations.TokensAnnotation.class);
for (CoreLabel token : tokens) {
Integer corefClustId= token.get(CorefCoreAnnotations.CorefClusterIdAnnotation.class);
System.out.println(token.word() + " --> corefClusterID = " + corefClustId);
CorefChain chain = corefs.get(corefClustId);
System.out.println("matched chain = " + chain);
if(chain==null){
resolved.add(token.word());
}else{
int sentINdx = chain.getRepresentativeMention().sentNum -1;
CoreMap corefSentence = sentences.get(sentINdx);
List<CoreLabel> corefSentenceTokens = corefSentence.get(TokensAnnotation.class);
String newwords = "";
CorefMention reprMent = chain.getRepresentativeMention();
System.out.println(reprMent);
for(int i = reprMent.startIndex; i<reprMent.endIndex; i++){
CoreLabel matchedLabel = corefSentenceTokens.get(i-1); //resolved.add(tokens.get(i).word());
resolved.add(matchedLabel.word());
newwords+=matchedLabel.word()+" ";
}
System.out.println("converting " + token.word() + " to " + newwords);
}
System.out.println();
System.out.println();
System.out.println("-----------------------------------------------------------------");
}
}
String resolvedStr ="";
System.out.println();
for (String str : resolved) {
resolvedStr+=str+" ";
}
System.out.println(resolvedStr);
}
The best output I was able to achieve for now is
John drove to Judy 's 's Judy 's house . John made Judy 's her dinner .
which is not very brilliant ...
I'm pretty sure there is a MUCH easier way to do what I am trying to achieve.
Ideally, I would like to reorganize the sentence as a list of CoreLabels, so that I could keep the other data they have attached to them.
Any help appreciated.
The challenge is you need to make sure that the token isn't part of its representative mention. For example, the token "Judy" has "Judy 's" as its representative mention, so if you replace it in the phrase "Judy 's", you'll end up with the double "'s".
You can check if the token is part of its representative mention by comparing their indices. You should only replace the token if its index is either smaller than the startIndex
of the representative mention, or larger than the endIndex
of the representative mention. Otherwise you just keep the token.
The relevant part of your code will now look like this:
if (token.index() < reprMent.startIndex || token.index() > reprMent.endIndex) {
for (int i = reprMent.startIndex; i < reprMent.endIndex; i++) {
CoreLabel matchedLabel = corefSentenceTokens.get(i - 1);
resolved.add(matchedLabel.word());
newwords += matchedLabel.word() + " ";
}
}
else {
resolved.add(token.word());
}
In addition, and to speed up the process, you can also replace your first if-condition by:
if (chain==null || chain.getMentionsInTextualOrder().size() == 1)
After all, if the length of the co-reference chain is just 1, there is no use looking for a representative mention.
private void doTest(String text){
Properties props = new Properties();
props.put("annotators", "tokenize, ssplit, pos, lemma, ner, parse, dcoref");
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
Annotation doc = new Annotation(text);
pipeline.annotate(doc);
Map<Integer, CorefChain> corefs = doc.get(CorefChainAnnotation.class);
List<CoreMap> sentences = doc.get(CoreAnnotations.SentencesAnnotation.class);
List<String> resolved = new ArrayList<String>();
for (CoreMap sentence : sentences) {
List<CoreLabel> tokens = sentence.get(CoreAnnotations.TokensAnnotation.class);
for (CoreLabel token : tokens) {
Integer corefClustId= token.get(CorefCoreAnnotations.CorefClusterIdAnnotation.class);
System.out.println(token.word() + " --> corefClusterID = " + corefClustId);
CorefChain chain = corefs.get(corefClustId);
System.out.println("matched chain = " + chain);
if(chain==null){
resolved.add(token.word());
System.out.println("Adding the same word "+token.word());
}else{
int sentINdx = chain.getRepresentativeMention().sentNum -1;
System.out.println("sentINdx :"+sentINdx);
CoreMap corefSentence = sentences.get(sentINdx);
List<CoreLabel> corefSentenceTokens = corefSentence.get(TokensAnnotation.class);
String newwords = "";
CorefMention reprMent = chain.getRepresentativeMention();
System.out.println("reprMent :"+reprMent);
System.out.println("Token index "+token.index());
System.out.println("Start index "+reprMent.startIndex);
System.out.println("End Index "+reprMent.endIndex);
if (token.index() <= reprMent.startIndex || token.index() >= reprMent.endIndex) {
for (int i = reprMent.startIndex; i < reprMent.endIndex; i++) {
CoreLabel matchedLabel = corefSentenceTokens.get(i - 1);
resolved.add(matchedLabel.word().replace("'s", ""));
System.out.println("matchedLabel : "+matchedLabel.word());
newwords += matchedLabel.word() + " ";
}
}
else {
resolved.add(token.word());
System.out.println("token.word() : "+token.word());
}
System.out.println("converting " + token.word() + " to " + newwords);
}
System.out.println();
System.out.println();
System.out.println("-----------------------------------------------------------------");
}
}
String resolvedStr ="";
System.out.println();
for (String str : resolved) {
resolvedStr+=str+" ";
}
System.out.println(resolvedStr);
}
Gave perfect answer.
John drove to Judy’s house. He made her dinner. -----> John drove to Judy 's house . John made Judy dinner . Tom is a smart boy. He know a lot of thing. -----> Tom is a smart Tom . Tom know a lot of thing .
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With