I try to do text classification naive bayes weka libarary in my java code, but i think the result of the classification is not correct, i don't know what's the problem. I use arff file for the input.
this is my training data:
@relation hamspam
@attribute text string
@attribute class {spam,ham}
@data
'good',ham
'good',ham
'very good',ham
'bad',spam
'very bad',spam
'very bad, very bad',spam
'good good bad',ham
this is my testing_data:
@relation test
@attribute text string
@attribute class {spam,ham}
@data
'good bad very bad',?
'good bad very bad',?
'good',?
'good very good',?
'bad',?
'very good',?
'very very good',?
and this is my code:
public static void NaiveBayes(String training_file, String testing_file) throws FileNotFoundException, IOException, Exception{
//filter
StringToWordVector filter = new StringToWordVector();
Classifier naive = new NaiveBayes();
//training data
Instances train = new Instances(new BufferedReader(new FileReader(training_file)));
int lastIndex = train.numAttributes() - 1;
train.setClassIndex(lastIndex);
filter.setInputFormat(train);
train = Filter.useFilter(train, filter);
//testing data
Instances test = new Instances(new BufferedReader(new FileReader(testing_file)));
test.setClassIndex(lastIndex);
filter.setInputFormat(test);
Instances test2 = Filter.useFilter(test, filter);
naive.buildClassifier(train);
for(int i=0; i<test2.numInstances(); i++) {
System.out.println(test.instance(i));
double index = naive.classifyInstance(test2.instance(i));
String className = train.attribute(0).value((int)index);
System.out.println(className);
}
}
The result indicate that the data that should have been classified into class spam classified into class ham, and the data that should have been classified into class ham classified into class spam. what's the problem?, help me please..
Naive Bayes classifiers have been heavily used for text classification and text analysis machine learning problems. Text Analysis is a major application field for machine learning algorithms.
The Bayes' Theorem is used to build a set of classification algorithms known as Naive Bayes classifiers. It is a family of algorithms that share a common concept, namely that each pair of features being classified is independent of the others.
What is Naive Bayes algorithm? It is a classification technique based on Bayes' Theorem with an assumption of independence among predictors. In simple terms, a Naive Bayes classifier assumes that the presence of a particular feature in a class is unrelated to the presence of any other feature.
Your code seems fine, though i have two comments to make.
filter.setInputFormat(train);
so as to use this filter and make test and train data compatible. You should not change the format again with this command: filter.setInputFormat(test);
as this might create compatibility issues. train.attribute(0).value((int)index);
(which seems to me that is not corresponds to class attribute) try using this command train.classAttribute().value((int)index);
P.S. Check Load naïve Bayes model in Java code using weka jar for a complete workflow and explanation of a classification example (the material was once in SO Documentation). This example is using the LibLinear classifier but the logic is the same.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With