I am working on a text classification problem, I am trying to classify a collection of words into category, yes there are plenty of libraries available for classification, so please dont answer if you are suggesting to use them.
Let me explain what I want to implement. ( take for example )
List of Words:
List of Categories.
here we will train the set, as:
Now we have a phrase "The best java programming book" from the given phrase following words are a match to our "List of Words.":
"programming" has two mapped categories "java" & "c-sharp" so it is a common word.
"java" is mapped to category "java" only.
So our matching category for the phrase is "java"
This is what came to my mind, is this solution fine, can it be implemented, what are your suggestions, any thing I am missing out, flaws, etc..
Of course this can be implemented. If you train a Naive Bayes classifier or linear SVM on the right dataset (titles of Java and C# programming books, I guess), it should learn to associate the term "Java" with Java, "C#" and ".NET" with C#, and "programming" with both. I.e., a Naive Bayes classifier would likely learn a roughly even probability of Java or C# for common terms like "programming" if the dataset is divided evenly.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With