How I train an Named Entity Recognizer identifier in OpenNLP?

Tags:

Ok, I have the following code to train the NER Identifier from OpenNLP

FileReader fileReader = new FileReader("train.txt");
ObjectStream fileStream = new PlainTextByLineStream(fileReader);
ObjectStream sampleStream = new NameSampleDataStream(fileStream);
TokenNameFinderModel model = NameFinderME.train("pt-br", "train", sampleStream, Collections.<String, Object>emptyMap());
nfm = new NameFinderME(model);

I don't know if I'm doing something wrong of if something is missing, but the classifying is not working. I'm supposing that the train.txt is wrong.

The error that occurs is that all tokens are classified to only one type.

My train.txt data is something like the following example, but with a lot more of variation and quantity of entries. Another thing is that I'm classifind word by word from a text per time, and not all tokens.

<START:distance> 8000m <END>
<START:temperature> 100ºC <END>
<START:weight> 50kg <END>
<START:name> Renato <END>

Somebody can show what I doing wrong?

904

asked Aug 05 '11 06:08

Renato Dinhani

1 Answers

Your training data is not OK.

You should put all entities in a context inside a sentence:

At an altitude of <START:distance> 8000m <END> the temperature of boiling water is less than <START:temperature> 100ºC <END> .
The climber <START:name> Renato <END> is carrying <START:weight> 50kg <END> of equipment.

You will have better results if your training data derives from real world sentences and have the same style of the sentences you are classifying. For example you should train using a newspaper corpus if you will process news.

Also you will need thousands of sentences to build your model! Maybe you can start with a hundred to bootstrap and use the poor model to improve your corpus and train your model again.

And of course you should classify all tokens of a sentence, otherwise there will be no context to decide the type of an entity.

160

answered Oct 05 '22 22:10

wcolen

Related questions
                            
                                How to properly put JSPs in the WEB-INF folder?
                            
                                how can I detect arrow keys in java?
                            
                                Using ant war task to include files in WEB-INF directory
                            
                                No autodetection of JPA Entities in maven-verify
                            
                                OpenSAML bootstrap() nets me "InputStream cannot be null"
                            
                                writing a public int compareTo() method java
                            
                                How do I show an alert dialog only on the first run of my application?
                            
                                Unit testing equals and hashcode - a complexity story
                            
                                How to apply xsl to xml in Java
                            
                                "Cannot find symbol" for my own class
                            
                                Combine two lists with no duplicates
                            
                                Do some Android UI stuff in non-UI thread
                            
                                ArrayList Efficiency and size
                            
                                Log SOAP Messages
                            
                                Properly closing SSLSocket
                            
                                Is there any difference between the Java and C++ operators?
                            
                                Reset Graphics2D object in Java
                            
                                How to find cause of exception if type is Throwable
                            
                                Get Version Info for .exe
                            
                                How to call a Macro from Apache POI library in Java?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How I train an Named Entity Recognizer identifier in OpenNLP?

Tags:

java

nlp

named-entity-recognition

opennlp

Renato Dinhani

People also ask

1 Answers

wcolen

Recent Activity

Donate For Us