Name Entity Resolution Algorithm

Tags:

I was trying to build an entity resolution system, where my entities are,

(i) General named entities, that is organization, person, location,date, time, money, and percent.
(ii) Some other entities like, product, title of person like president,ceo, etc. 
(iii) Corefererred entities like, pronoun, determiner phrase,synonym, string match, demonstrative noun phrase, alias, apposition.

From various literature and other references, I have defined its scope as I would not consider the ambiguity of each of the entity beyond its entity category. That is, I am taking Oxford of Oxford University as different from Oxford as place, as the previous one is the first word of an organization entity and second one is the entity of location.

My task is to construct one resolution algorithm, where I would extract and resolve the entities.

So, I am working out an entity extractor in the first place. In the second place, if I try to relate the coreferences as I found from various literatures like this seminal work, they are trying to work out a decision tree based algorithm, with some features like, distance, i-pronoun, j-pronoun, string match, definite noun phrase, demonstrative noun phrase, number agreement feature, semantic class agreement, gender agreement, both proper names, alias, apposition etc.

The algorithm seems a nice one where enities are extracted with Hidden Markov Model(HMM).

I could work out one entity recognition system with HMM. Now I am trying to work out a coreference as well as an entity resolution system. I was trying to feel instead of using so many features if I use an annotated corpus and train it directly with HMM based tagger, with a view to solve a relationship extraction like,

*"Obama/PERS is/NA delivering/NA a/NA lecture/NA in/NA Washington/LOC, he/PPERS knew/NA it/NA was/NA going/NA to/NA be/NA
small/NA as/NA it/NA may/NA not/NA be/NA his/PoPERS speech/NA as/NA Mr. President/APPERS"

where, PERS-> PERSON
       PPERS->PERSONAL PRONOUN TO PERSON
       PoPERS-> POSSESSIVE PRONOUN TO PERSON
       APPERS-> APPOSITIVE TO PERSON
       LOC-> LOCATION
       NA-> NOT AVAILABLE*

would I be wrong? I made an experiment with around 10,000 words. Early results seem encouraging. With a support from one of my colleague I am trying to insert some semantic information like, PERSUSPOL, LOCCITUS, PoPERSM, etc. for PERSON OF US IN POLITICS, LOCATION CITY US, POSSESSIVE PERSON MALE, in the tagset to incorporate entity disambiguation at one go. My feeling relationship extraction would be much better now. Please see this new thought too. I got some good results with Naive Bayes classifier also where sentences having predominately one set of keywords are marked as one class.

If any one may suggest any different approach, please feel free to suggest so.

I use Python2.x on MS-Windows and try to use libraries like NLTK, Scikit-learn, Gensim, pandas, Numpy, Scipy etc.

Thanks in Advance.

269

asked Apr 10 '16 20:04

Coeus2016

1 Answers

It seems that you are going in three different paths that are totally different and each can be done in a stand alone Phd. There are many literature about them. My first advice focus on the main task and outsource the remaining. If you are going to develop this for non-famous language, also, you can build on others.

Named Entity Recognition

Standford NLP have really go too far in that specially for English. They resolve named entities really good, they are widely used and have a nice community.

Other solution may exist in openNLP for python .

Some tried to extend it to unusual fine-grain types but you need much bigger training data to cover the cases and the decision becomes much harder.

Edit: Stanford NER exists in NLTK python

Named Entity Resolution/Linking/Disambiguation

This is concerned with linking the name to some knowledge base, and solves the problem of whether Oxford University of Oxford City.

AIDA: is one of the state-of-art in that. They uses different context information as well as coherence information. Also, they have tried supporting several languages. They have a good bench mark.

Babelfy: offers interesting API that does NER and NED for Entities and concepts. Also, they support many language but never worked very well.

others like tagme and wikifi ...etc

Conference Resolution

Also Stanford CoreNLP has some good work in that direction. I can also recommend this work where they combined Conference Resolution with NED.

182

answered Oct 12 '22 11:10

Mohamed Gad-Elrab

Related questions
                            
                                Key echo in Python in separate thread doesn't display first key stroke
                            
                                Why does setattr work differently for attributes and methods?
                            
                                Combining Python and Javascript in a chrome plugin
                            
                                Sqlalchemy AttributeError: 'NoneType' object has no attribute '_getter'
                            
                                Google Cloud - oauth2client.client.HttpAccessTokenRefreshError: invalid_grant
                            
                                TextVariable not working
                            
                                Detecting the centre of a curved shape with opencv
                            
                                Unable to patch class instantiated by the tested class using unittest
                            
                                Using different word2vec training data in spaCy
                            
                                Multidimensional symbolic matrix in Python
                            
                                How do I structure my Python project to allow named modules to be imported from sub directories
                            
                                Cannot map ForeignKey due to dual Primary Keys
                            
                                Avoid converting data to int automatically while reading using pandas data frame
                            
                                Return Number of Errors From Splunk Search in Python
                            
                                How do I check for a certain type of OSError in a try except block?
                            
                                What is the Python version of Object.keys()? [duplicate]
                            
                                Unable to use utf8mb4 character set with CloudSQL on AppEngine Python
                            
                                Lat/lon using Basemap and maskoceans getting mixed up after "for" loop
                            
                                using namedtuples as de facto consts -- clever or stupid?
                            
                                Connecting Python to MySQL using an encrypted option file

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Name Entity Resolution Algorithm

Tags:

python

algorithm

machine-learning

nlp

Coeus2016

People also ask

1 Answers

Mohamed Gad-Elrab

Recent Activity

Donate For Us