My problem is to filter out all the names of persons in a table, i.e. names of companies, schools, institutions will be left in the database.
I tried a simple solution wherein I was given a list of the name of companies, schools, etc. And I searched for the most common terms there. (Note: I did not search for the common strings in a name, since that would cost a lot). I assigned weight to those terms, and also to the most common substrings. With that, if the string has a corp, inc, school, univ in it then it's very highly possible that it's not a name of a person.
Now, my problem is how can I make it into an AI. Moreover, I will have to make it possible such that classifications of companies only, schools only, etc. will be easier.
For example
XYZ Brewery Corporation -> company
Harvard University -> school
Department of Health -> government agency
The only AI techniques I know are Naive-Bayes, K-Means, Hierarchical, FCM, ANN. Those techniques commonly get numerical values, so, I don't know how to make it into an AI. The only AI techniques that I know that handles strings extensively are Levenshtein, Stemming, Needleman-Wunch and Jaro-Winkler.
Is my first approach incorrect? How can incorporate the techniques that I know? Do I have to learn a new technique? I'm basically new to AI since I am still a student. However, this is not an assignment but it's for a company project (actually I am the only computer science major in our group, so it's very heavy on my part). By the way, if you are curious on what language I use, I am using C# since I am planning to make it just a stand-alone application and the users are using Windows.
This problem is generally called Named Entity Recognition (NER). The SharpNLP project is a C# library of NLP algorithms, including NER. It seems to be completely undocumented, though it's a C# port of Apache's OpenNLP, which has documentation on name finding; SharpNLP's interface is presumably similar.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With