In natural language processing, named-entity recognition is the challenge of, well, recognizing named entities such as organizations, places, and most importantly names. There is a major challenge in this though that I call that of synonymy: The Count and Dracula are in fact referring to the same person, but it it possible that this is never discussed directly in the text. What would be the best algorithm to resolve these synonyms? <hr> If there is a feature for this in any Python-based library, I'm eager to be educated. I'm using NLTK.

You are describing a problem of coreference resolution and named entity linking. I'm providing separate links as I am not entirely sure which one you meant. <ul> <li>Coreference: Stanford CoreNLP currently has one of the best implementations, but is in Java. I have used the python bindings and I wasn't too happy- I ended up running all my data through the Stanford pipeline just once, and then loading the processed XML files in python. Obviously, that doesn't work if you have to be processing in real time.</li> <li>Named entity linking: Check out Apache Stanbol and the links in the following Stackoverflow post.</li> </ul>

How can one resolve synonyms in named-entity recognition?

Tags:

nlp

nltk

named-entity-recognition

In natural language processing, named-entity recognition is the challenge of, well, recognizing named entities such as organizations, places, and most importantly names.

There is a major challenge in this though that I call that of synonymy: The Count and Dracula are in fact referring to the same person, but it it possible that this is never discussed directly in the text.

What would be the best algorithm to resolve these synonyms?

If there is a feature for this in any Python-based library, I'm eager to be educated. I'm using NLTK.

473

asked Apr 05 '13 13:04

Sean Allred

1 Answers

You are describing a problem of coreference resolution and named entity linking. I'm providing separate links as I am not entirely sure which one you meant.

Coreference: Stanford CoreNLP currently has one of the best implementations, but is in Java. I have used the python bindings and I wasn't too happy- I ended up running all my data through the Stanford pipeline just once, and then loading the processed XML files in python. Obviously, that doesn't work if you have to be processing in real time.
Named entity linking: Check out Apache Stanbol and the links in the following Stackoverflow post.

194

answered Oct 11 '22 03:10

mbatchkarov

Related questions
                            
                                Explaining CNN (Keras) outputs with LIME
                            
                                Is it possible to get a confidence score on Spacy Named-entity recognition
                            
                                PyTorch Huggingface BERT-NLP for Named Entity Recognition
                            
                                Training times for Spacy Entity Linking model
                            
                                How to get probability of prediction per entity from Spacy NER model?
                            
                                How to implement a SIMPLE "You typed ACB, did you mean ABC?"
                            
                                Natural Language Processing Algorithm for mood of an email
                            
                                Variations in spelling of first name
                            
                                Is there a library or web service that provides pronunciations for text?
                            
                                Using Markov models to convert all-caps to mixed-case and related problems
                            
                                Mallet CRF SimpleTagger Performance Tuning
                            
                                Discovering "templates" in a given text?
                            
                                Converting adjectives and adverbs to their noun forms
                            
                                Brute-Force language detection
                            
                                language detection
                            
                                TreeTagger installation successful but cannot open .par file
                            
                                Determining tense of a sentence Python
                            
                                Latent Dirichlet Allocation, pitfalls, tips and programs
                            
                                Lemmatization java [closed]
                            
                                Are there APIs for text analysis/mining in Java? [closed]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With