How to replace a word by its most representative mention using Stanford CoreNLP Coreferences module

Question

I am trying to figure out the way to rewrite sentences by "resolving" (replacing words with) their coreferences using Stanford Corenlp's Coreference module.

The idea is to rewrite a sentence like the following :

John drove to Judy’s house. He made her dinner.

into

John drove to Judy’s house. John made Judy dinner.

Here's the code I've been fooling around with :

    private void doTest(String text){
    Annotation doc = new Annotation(text);
    pipeline.annotate(doc);


    Map<Integer, CorefChain> corefs = doc.get(CorefChainAnnotation.class);
    List<CoreMap> sentences = doc.get(CoreAnnotations.SentencesAnnotation.class);


    List<String> resolved = new ArrayList<String>();

    for (CoreMap sentence : sentences) {

        List<CoreLabel> tokens = sentence.get(CoreAnnotations.TokensAnnotation.class);

        for (CoreLabel token : tokens) {

            Integer corefClustId= token.get(CorefCoreAnnotations.CorefClusterIdAnnotation.class);
            System.out.println(token.word() +  " --> corefClusterID = " + corefClustId);


            CorefChain chain = corefs.get(corefClustId);
            System.out.println("matched chain = " + chain);


            if(chain==null){
                resolved.add(token.word());
            }else{

                int sentINdx = chain.getRepresentativeMention().sentNum -1;
                CoreMap corefSentence = sentences.get(sentINdx);
                List<CoreLabel> corefSentenceTokens = corefSentence.get(TokensAnnotation.class);

                String newwords = "";
                CorefMention reprMent = chain.getRepresentativeMention();
                System.out.println(reprMent);
                for(int i = reprMent.startIndex; i<reprMent.endIndex; i++){
                    CoreLabel matchedLabel = corefSentenceTokens.get(i-1); //resolved.add(tokens.get(i).word());
                    resolved.add(matchedLabel.word());

                    newwords+=matchedLabel.word()+" ";

                }




                System.out.println("converting " + token.word() + " to " + newwords);
            }


            System.out.println();
            System.out.println();
            System.out.println("-----------------------------------------------------------------");

        }

    }


    String resolvedStr ="";
    System.out.println();
    for (String str : resolved) {
        resolvedStr+=str+" ";
    }
    System.out.println(resolvedStr);


}

The best output I was able to achieve for now is

John drove to Judy 's 's Judy 's house . John made Judy 's her dinner .

which is not very brilliant ...

I'm pretty sure there is a MUCH easier way to do what I am trying to achieve.

Ideally, I would like to reorganize the sentence as a list of CoreLabels, so that I could keep the other data they have attached to them.

Any help appreciated.

yvespeirsman · Accepted Answer

The challenge is you need to make sure that the token isn't part of its representative mention. For example, the token "Judy" has "Judy 's" as its representative mention, so if you replace it in the phrase "Judy 's", you'll end up with the double "'s".

You can check if the token is part of its representative mention by comparing their indices. You should only replace the token if its index is either smaller than the startIndex of the representative mention, or larger than the endIndex of the representative mention. Otherwise you just keep the token.

The relevant part of your code will now look like this:

            if (token.index() < reprMent.startIndex || token.index() > reprMent.endIndex) {

                for (int i = reprMent.startIndex; i < reprMent.endIndex; i++) {
                    CoreLabel matchedLabel = corefSentenceTokens.get(i - 1); 
                    resolved.add(matchedLabel.word());

                    newwords += matchedLabel.word() + " ";

                }
            }

            else {
                resolved.add(token.word());

            }

In addition, and to speed up the process, you can also replace your first if-condition by:

if (chain==null || chain.getMentionsInTextualOrder().size() == 1)

After all, if the length of the co-reference chain is just 1, there is no use looking for a representative mention.

Deepa Huddar · Answer

private void doTest(String text){
    Properties props = new Properties();
    props.put("annotators", "tokenize, ssplit, pos, lemma, ner, parse, dcoref");
    StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
    Annotation doc = new Annotation(text);
    pipeline.annotate(doc);


    Map<Integer, CorefChain> corefs = doc.get(CorefChainAnnotation.class);
    List<CoreMap> sentences = doc.get(CoreAnnotations.SentencesAnnotation.class);


    List<String> resolved = new ArrayList<String>();

    for (CoreMap sentence : sentences) {

        List<CoreLabel> tokens = sentence.get(CoreAnnotations.TokensAnnotation.class);

        for (CoreLabel token : tokens) {

            Integer corefClustId= token.get(CorefCoreAnnotations.CorefClusterIdAnnotation.class);
            System.out.println(token.word() +  " --> corefClusterID = " + corefClustId);


            CorefChain chain = corefs.get(corefClustId);
            System.out.println("matched chain = " + chain);


            if(chain==null){
                resolved.add(token.word());
                System.out.println("Adding the same word "+token.word());
            }else{

                int sentINdx = chain.getRepresentativeMention().sentNum -1;
                System.out.println("sentINdx :"+sentINdx);
                CoreMap corefSentence = sentences.get(sentINdx);
                List<CoreLabel> corefSentenceTokens = corefSentence.get(TokensAnnotation.class);
                String newwords = "";
                CorefMention reprMent = chain.getRepresentativeMention();
                System.out.println("reprMent :"+reprMent);
                System.out.println("Token index "+token.index());
                System.out.println("Start index "+reprMent.startIndex);
                System.out.println("End Index "+reprMent.endIndex);
                if (token.index() <= reprMent.startIndex || token.index() >= reprMent.endIndex) {

                        for (int i = reprMent.startIndex; i < reprMent.endIndex; i++) {
                            CoreLabel matchedLabel = corefSentenceTokens.get(i - 1); 
                            resolved.add(matchedLabel.word().replace("'s", ""));
                            System.out.println("matchedLabel : "+matchedLabel.word());
                            newwords += matchedLabel.word() + " ";

                        }
                    }

                    else {
                        resolved.add(token.word());
                        System.out.println("token.word() : "+token.word());
                    }



                System.out.println("converting " + token.word() + " to " + newwords);
            }


            System.out.println();
            System.out.println();
            System.out.println("-----------------------------------------------------------------");

        }

    }


    String resolvedStr ="";
    System.out.println();
    for (String str : resolved) {
        resolvedStr+=str+" ";
    }
    System.out.println(resolvedStr);


}

Gave perfect answer.

John drove to Judy’s house. He made her dinner. -----> John drove to Judy 's house . John made Judy dinner . Tom is a smart boy. He know a lot of thing. -----> Tom is a smart Tom . Tom know a lot of thing .

How to replace a word by its most representative mention using Stanford CoreNLP Coreferences module

Tags:

java

nlp

stanford-nlp

azpublic

2 Answers

yvespeirsman

Deepa Huddar

Recent Activity

Donate For Us

How to replace a word by its most representative mention using Stanford CoreNLP Coreferences module

Tags:

java

nlp

stanford-nlp

azpublic

2 Answers

yvespeirsman

Deepa Huddar

Related questions

Recent Activity

Donate For Us