OpenNLP classifier output

Question

At the moment I'm using the following code to train a classifier model :

    final String iterations = "1000";
    final String cutoff = "0";
    InputStreamFactory dataIn = new MarkableFileInputStreamFactory(new File("src/main/resources/trainingSets/classifierA.txt"));
    ObjectStream<String> lineStream = new PlainTextByLineStream(dataIn, "UTF-8");
    ObjectStream<DocumentSample> sampleStream = new DocumentSampleStream(lineStream);

    TrainingParameters params = new TrainingParameters();
    params.put(TrainingParameters.ITERATIONS_PARAM, iterations);
    params.put(TrainingParameters.CUTOFF_PARAM, cutoff);
    params.put(AbstractTrainer.ALGORITHM_PARAM, NaiveBayesTrainer.NAIVE_BAYES_VALUE);

    DoccatModel model = DocumentCategorizerME.train("NL", sampleStream, params, new DoccatFactory());

    OutputStream modelOut = new BufferedOutputStream(new FileOutputStream("src/main/resources/models/model.bin"));
    model.serialize(modelOut);

    return model;

This goes well and after every run I get the following output :

    Indexing events with TwoPass using cutoff of 0

    Computing event counts...  done. 1474 events
    Indexing...  done.
Collecting events... Done indexing in 0,03 s.
Incorporating indexed data for training...  
done.
    Number of Event Tokens: 1474
        Number of Outcomes: 2
      Number of Predicates: 4149
Computing model parameters...
Stats: (998/1474) 0.6770691994572592
...done.

Could someone explain what this output means? And if it tells something about the accuracy?

marcelovca90 · Accepted Answer

Looking at the source, we can tell this output is done by NaiveBayesTrainer::trainModel method:

public AbstractModel trainModel(DataIndexer di) {
    // ...
    display("done.
");
    display("	Number of Event Tokens: " + numUniqueEvents + "
");
    display("	    Number of Outcomes: " + numOutcomes + "
");
    display("	  Number of Predicates: " + numPreds + "
");
    display("Computing model parameters...
");
    MutableContext[] finalParameters = findParameters();
    display("...done.
");
    // ...
}

If you take a look at findParameters() code, you'll notice that it calls the trainingStats() method, which contains the code snippet that calculates the accuracy:

private double trainingStats(EvalParameters evalParams) {
    // ...
    double trainingAccuracy = (double) numCorrect / numEvents;
    display("Stats: (" + numCorrect + "/" + numEvents + ") " + trainingAccuracy + "
");
    return trainingAccuracy;
}

TL;DR the Stats: (998/1474) 0.6770691994572592 part of the output is the accuracy you're looking for.

OpenNLP classifier output

Tags:

java

text

machine-learning

opennlp

categorization

Patrick

1 Answers

marcelovca90

Recent Activity

Donate For Us

OpenNLP classifier output

Tags:

java

text

machine-learning

opennlp

categorization

Patrick

1 Answers

marcelovca90

Related questions

Recent Activity

Donate For Us