Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

AI - String/Text Classification/Categorization (e.g. a string/text is classified as a company name)

My problem is to filter out all the names of persons in a table, i.e. names of companies, schools, institutions will be left in the database.

I tried a simple solution wherein I was given a list of the name of companies, schools, etc. And I searched for the most common terms there. (Note: I did not search for the common strings in a name, since that would cost a lot). I assigned weight to those terms, and also to the most common substrings. With that, if the string has a corp, inc, school, univ in it then it's very highly possible that it's not a name of a person.

Now, my problem is how can I make it into an AI. Moreover, I will have to make it possible such that classifications of companies only, schools only, etc. will be easier.

For example

XYZ Brewery Corporation -> company
Harvard University -> school
Department of Health -> government agency

The only AI techniques I know are Naive-Bayes, K-Means, Hierarchical, FCM, ANN. Those techniques commonly get numerical values, so, I don't know how to make it into an AI. The only AI techniques that I know that handles strings extensively are Levenshtein, Stemming, Needleman-Wunch and Jaro-Winkler.

Is my first approach incorrect? How can incorporate the techniques that I know? Do I have to learn a new technique? I'm basically new to AI since I am still a student. However, this is not an assignment but it's for a company project (actually I am the only computer science major in our group, so it's very heavy on my part). By the way, if you are curious on what language I use, I am using C# since I am planning to make it just a stand-alone application and the users are using Windows.

like image 919
JinShin Avatar asked Nov 26 '25 06:11

JinShin


1 Answers

This problem is generally called Named Entity Recognition (NER). The SharpNLP project is a C# library of NLP algorithms, including NER. It seems to be completely undocumented, though it's a C# port of Apache's OpenNLP, which has documentation on name finding; SharpNLP's interface is presumably similar.

like image 70
Danica Avatar answered Nov 27 '25 20:11

Danica



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!