Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Simple text classification using naive bayes (weka) in java

I try to do text classification naive bayes weka libarary in my java code, but i think the result of the classification is not correct, i don't know what's the problem. I use arff file for the input.

this is my training data:

@relation hamspam

@attribute text string
@attribute class {spam,ham}

@data
'good',ham
'good',ham
'very good',ham
'bad',spam
'very bad',spam
'very bad, very bad',spam
'good good bad',ham

this is my testing_data:

@relation test

@attribute text string
@attribute class {spam,ham}

@data
'good bad very bad',?
'good bad very bad',?
'good',?
'good very good',?
'bad',?
'very good',?
'very very good',?

and this is my code:

public static void NaiveBayes(String training_file, String testing_file) throws FileNotFoundException, IOException, Exception{
         //filter
        StringToWordVector filter = new StringToWordVector();

        Classifier naive = new NaiveBayes();

        //training data
        Instances train = new Instances(new BufferedReader(new FileReader(training_file)));
        int lastIndex = train.numAttributes() - 1;
        train.setClassIndex(lastIndex);
        filter.setInputFormat(train);
        train = Filter.useFilter(train, filter);

        //testing data
        Instances test = new Instances(new BufferedReader(new FileReader(testing_file)));
        test.setClassIndex(lastIndex);
        filter.setInputFormat(test);
        Instances test2 = Filter.useFilter(test, filter);

        naive.buildClassifier(train);

        for(int i=0; i<test2.numInstances(); i++) {
            System.out.println(test.instance(i));
            double index = naive.classifyInstance(test2.instance(i));
            String className = train.attribute(0).value((int)index);
            System.out.println(className);
        }
    }

The result indicate that the data that should have been classified into class spam classified into class ham, and the data that should have been classified into class ham classified into class spam. what's the problem?, help me please..

like image 863
Muhammad Haryadi Futra Avatar asked Jan 30 '17 11:01

Muhammad Haryadi Futra


People also ask

Can we use naive Bayes for text classification?

Naive Bayes classifiers have been heavily used for text classification and text analysis machine learning problems. Text Analysis is a major application field for machine learning algorithms.

What is naive Bayes in Weka?

The Bayes' Theorem is used to build a set of classification algorithms known as Naive Bayes classifiers. It is a family of algorithms that share a common concept, namely that each pair of features being classified is independent of the others.

What is Naive Bayes classifier algorithm?

What is Naive Bayes algorithm? It is a classification technique based on Bayes' Theorem with an assumption of independence among predictors. In simple terms, a Naive Bayes classifier assumes that the presence of a particular feature in a class is unrelated to the presence of any other feature.


1 Answers

Your code seems fine, though i have two comments to make.

  • First, you set filter's format with this command filter.setInputFormat(train); so as to use this filter and make test and train data compatible. You should not change the format again with this command: filter.setInputFormat(test); as this might create compatibility issues.
  • Also instead of getting the first attribute: train.attribute(0).value((int)index); (which seems to me that is not corresponds to class attribute) try using this command train.classAttribute().value((int)index);

P.S. Check Load naïve Bayes model in Java code using weka jar for a complete workflow and explanation of a classification example (the material was once in SO Documentation). This example is using the LibLinear classifier but the logic is the same.

like image 104
xro7 Avatar answered Oct 13 '22 17:10

xro7