In NLP there is a concept of Gazetteer
which can be quite useful for creating annotations. As far as i understand,
A gazetteer consists of a set of lists containing names of entities such as cities, organisations, days of the week, etc. These lists are used to find occurrences of these names in text, e.g. for the task of named entity recognition.
So it is essentially a lookup. Isn't this kind of a cheat? If we use a Gazetteer
for detecting named entities, then there is not much Natural Language Processing
going on. Ideally, i would want to detect named entities using NLP
techniques. Otherwise how is it any better than a regex pattern matcher.
Does that make sense?
A gazetteer consists of a set of lists containing names of entities such as cities, organisations, days of the week, etc. These lists are used to find occurrences of these names in text, e.g. for the task of named entity recognition. So it is essentially a lookup.
It typically contains information concerning the geographical makeup, social statistics and physical features of a country, region, or continent. Content of a gazetteer can include a subject's location, dimensions of peaks and waterways, population, gross domestic product and literacy rate.
How does named entity recognition work? NER scans whole text and detects named entities: It detects the sentence boundaries in a given document based on capitalization rules. Identifying the sentence boundaries will assist NER in finding and extracting relevant information from the document for the next steps.
Depends on how you built/use your gazetteer. If you are presenting experiments in a closed domain and you custom picked your gazetteer, then yes, you are cheating. If you are using some openly available gazetteer and performing experiments on a large dataset or using it in an application in the wild where you don't control the input then you are fine. We found ourselves in a similar situation. We partition our dataset and use the training data to automatically build our gazetteers. As long as you report your methodology you should not feel like cheating (let the reviewers complain).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With