Multi-term named entities in Stanford Named Entity Recognizer

Question

I'm using the Stanford Named Entity Recognizer http://nlp.stanford.edu/software/CRF-NER.shtml and it's working fine. This is

    List<List<CoreLabel>> out = classifier.classify(text);
    for (List<CoreLabel> sentence : out) {
        for (CoreLabel word : sentence) {
            if (!StringUtils.equals(word.get(AnswerAnnotation.class), "O")) {
                namedEntities.add(word.word().trim());           
            }
        }
    }

However the problem I'm finding is identifying names and surnames. If the recognizer encounters "Joe Smith", it is returning "Joe" and "Smith" separately. I'd really like it to return "Joe Smith" as one term.

Could this be achieved through the recognizer maybe through a configuration? I didn't find anything in the javadoc till now.

Thanks!

Christopher Manning · Accepted Answer

This is because your inner for loop is iterating over individual tokens (words) and adding them separately. You need to change things to add whole names at once.

One way is to replace the inner for loop with a regular for loop with a while loop inside it which takes adjacent non-O things of the same class and adds them as a single entity.*

Another way would be to use the CRFClassifier method call:

List<Triple<String,Integer,Integer>> classifyToCharacterOffsets(String sentences)

which will give you whole entities, which you can extract the String form of by using substring on the original input.

*The models that we distribute use a simple raw IO label scheme, where things are labeled PERSON or LOCATION, and the appropriate thing to do is simply to coalesce adjacent tokens with the same label. Many NER systems use more complex labels such as IOB labels, where codes like B-PERS indicates where a person entity starts. The CRFClassifier class and feature factories support such labels, but they're not used in the models we currently distribute (as of 2012).

Remi Mélisson · Answer

The counterpart of the classifyToCharacterOffsets method is that (AFAIK) you can't access the label of the entities.

As proposed by Christopher, here is an example of a loop which assembles "adjacent non-O things". This example also counts the number of occurrences.

public HashMap<String, HashMap<String, Integer>> extractEntities(String text){

    HashMap<String, HashMap<String, Integer>> entities =
            new HashMap<String, HashMap<String, Integer>>();

    for (List<CoreLabel> lcl : classifier.classify(text)) {

        Iterator<CoreLabel> iterator = lcl.iterator();

        if (!iterator.hasNext())
            continue;

        CoreLabel cl = iterator.next();

        while (iterator.hasNext()) {
            String answer =
                    cl.getString(CoreAnnotations.AnswerAnnotation.class);

            if (answer.equals("O")) {
                cl = iterator.next();
                continue;
            }

            if (!entities.containsKey(answer))
                entities.put(answer, new HashMap<String, Integer>());

            String value = cl.getString(CoreAnnotations.ValueAnnotation.class);

            while (iterator.hasNext()) {
                cl = iterator.next();
                if (answer.equals(
                        cl.getString(CoreAnnotations.AnswerAnnotation.class)))
                    value = value + " " +
                           cl.getString(CoreAnnotations.ValueAnnotation.class);
                else {
                    if (!entities.get(answer).containsKey(value))
                        entities.get(answer).put(value, 0);

                    entities.get(answer).put(value,
                            entities.get(answer).get(value) + 1);

                    break;
                }
            }

            if (!iterator.hasNext())
                break;
        }
    }

    return entities;
}

Multi-term named entities in Stanford Named Entity Recognizer

Tags:

nlp

stanford-nlp

named-entity-recognition

Krt_Malta

2 Answers

Christopher Manning

Remi Mélisson

Recent Activity

Donate For Us

Multi-term named entities in Stanford Named Entity Recognizer

Tags:

nlp

stanford-nlp

named-entity-recognition

Krt_Malta

2 Answers

Christopher Manning

Remi Mélisson

Related questions

Recent Activity

Donate For Us