Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

traning OPenNLP error

Tags:

java

opennlp

I am trying to train a Name entity model using OpenNLP, but getting this error dont know what is missing. i am new to to this OPENNLP, any one please help, can provide Train.txt file if needed

lineStream = opennlp.tools.util.PlainTextByLineStream@b52598
Indexing events using cutoff of 0

Computing event counts...  done. 514 events
Indexing...  done.
Sorting and merging events... done. Reduced 514 events to 492.
Done indexing.
Incorporating indexed data for training...  
done.
Number of Event Tokens: 492
    Number of Outcomes: 1
  Number of Predicates: 3741
...done.
Computing model parameters ...
Performing 1 iterations.
1:  ... loglikelihood=0.0   1.0
Exception in thread "main" java.lang.IllegalArgumentException: Model not compatible with     name finder!
at opennlp.tools.namefind.TokenNameFinderModel.<init>(TokenNameFinderModel.java:81)
at opennlp.tools.namefind.TokenNameFinderModel.<init>(TokenNameFinderModel.java:106)
at opennlp.tools.namefind.NameFinderME.train(NameFinderME.java:374)
at opennlp.tools.namefind.NameFinderME.train(NameFinderME.java:432)
at opennlp.tools.namefind.NameFinderME.train(NameFinderME.java:443)
at Train2.main(Train2.java:36)
Java Result: 1
BUILD SUCCESSFUL (total time: 2 seconds)

My code is this

    File fileTrainer=new File("/home/ashfaq/Documents/Train.txt");
    File output=new File("/home/ashfaq/Documents/trainedModel.bin");
    ObjectStream<String> lineStream = new PlainTextByLineStream(new    FileInputStream(fileTrainer), "UTF-8");
    ObjectStream<NameSample> sampleStream = new NameSampleDataStream(lineStream);
    System.out.println("lineStream = " + lineStream);
    TokenNameFinderModel model = NameFinderME.train("en", "location", sampleStream, Collections.<String, Object>emptyMap(), 1, 0);

    BufferedOutputStream modelOut = null;
    try {
        modelOut = new BufferedOutputStream(new FileOutputStream(output));
        model.serialize(modelOut);
    } finally {
        if (modelOut != null)
            modelOut.close();
    }
like image 546
Ashfaq Avatar asked Dec 07 '13 10:12

Ashfaq


2 Answers

This is typically due to not having spaces after the tags in your training data. For instance,

<START:person>bob<END> 
will fail but 
<START:person> bob <END> 
will succeed.

Post a chunk of your training data if this does not fix the problem. Also, make sure each sentence in the training file is on a single line.. in other words all sentences should not contain \n and must end with \n

like image 63
Mark Giaconia Avatar answered Nov 20 '22 15:11

Mark Giaconia


I know this was asked eons ago, I faced a similar problem with categorization setting an appropriate cutoff solved my problem. So if you give a cutoff as 1 it might help(disclaimer:- I have not tested it)

If you want to retain a default cutoff(which is 5) then you have to train it a minimum of 5 times for it to recognize

like image 25
Rrrrr Avatar answered Nov 20 '22 16:11

Rrrrr