Text Classification into Categories

Question

I am working on a text classification problem, I am trying to classify a collection of words into category, yes there are plenty of libraries available for classification, so please dont answer if you are suggesting to use them.

Let me explain what I want to implement. ( take for example )

List of Words:

java
programming
language
c-sharp

List of Categories.

java
c-sharp

here we will train the set, as:

java maps to category 1. java
programming maps to category 1.java
programming maps to category 2.c-sharp
language maps to category 1.java
language maps to category 2.c-sharp
c-sharp maps to category 2.c-sharp

Now we have a phrase "The best java programming book" from the given phrase following words are a match to our "List of Words.":

java
programming

"programming" has two mapped categories "java" & "c-sharp" so it is a common word.

"java" is mapped to category "java" only.

So our matching category for the phrase is "java"

This is what came to my mind, is this solution fine, can it be implemented, what are your suggestions, any thing I am missing out, flaws, etc..

Fred Foo · Accepted Answer

Of course this can be implemented. If you train a Naive Bayes classifier or linear SVM on the right dataset (titles of Java and C# programming books, I guess), it should learn to associate the term "Java" with Java, "C#" and ".NET" with C#, and "programming" with both. I.e., a Naive Bayes classifier would likely learn a roughly even probability of Java or C# for common terms like "programming" if the dataset is divided evenly.

Text Classification into Categories

Tags:

machine-learning

classification

bayesian

Ajay Jadeja

1 Answers

Fred Foo

Recent Activity

Donate For Us

Text Classification into Categories

Tags:

machine-learning

classification

bayesian

Ajay Jadeja

1 Answers

Fred Foo

Related questions

Recent Activity

Donate For Us