Logo Questions Linux Laravel Mysql Ubuntu Git Menu

How to use OpenNLP with Java?

I want to POStag an English sentence and do some processing. I would like to use openNLP. I have it installed

When I execute the command

I:\Workshop\Programming\nlp\opennlp-tools-1.5.0-bin\opennlp-tools-1.5.0>java -jar opennlp-tools-1.5.0.jar POSTagger models\en-pos-maxent.bin < Text.txt

It gives output POSTagging the input in Text.txt

    Loading POS Tagger model ... done (4.009s)
My_PRP$ name_NN is_VBZ Shabab_NNP i_FW am_VBP 22_CD years_NNS old._.

Average: 66.7 sent/s
Total: 1 sent
Runtime: 0.015s

I hope it installed properly?

Now how do i do this POStagging from inside a java application? I have added the openNLPtools, jwnl, maxent jar to the project but how do i invoke the POStagging?

like image 744
shababhsiddique Avatar asked Apr 29 '11 18:04


People also ask

Is Apache open NLP free?

That means you can run all the lessons at course.fast.ai, and do your own research and development with fastai, all for free.

1 Answers

Here's some (old) sample code I threw together, with modernized code to follow:

package opennlp;

import opennlp.tools.cmdline.PerformanceMonitor;
import opennlp.tools.cmdline.postag.POSModelLoader;
import opennlp.tools.postag.POSModel;
import opennlp.tools.postag.POSSample;
import opennlp.tools.postag.POSTaggerME;
import opennlp.tools.tokenize.WhitespaceTokenizer;
import opennlp.tools.util.ObjectStream;
import opennlp.tools.util.PlainTextByLineStream;

import java.io.File;
import java.io.IOException;
import java.io.StringReader;

public class OpenNlpTest {
public static void main(String[] args) throws IOException {
    POSModel model = new POSModelLoader().load(new File("en-pos-maxent.bin"));
    PerformanceMonitor perfMon = new PerformanceMonitor(System.err, "sent");
    POSTaggerME tagger = new POSTaggerME(model);

    String input = "Can anyone help me dig through OpenNLP's horrible documentation?";
    ObjectStream<String> lineStream =
            new PlainTextByLineStream(new StringReader(input));

    String line;
    while ((line = lineStream.read()) != null) {

        String whitespaceTokenizerLine[] = WhitespaceTokenizer.INSTANCE.tokenize(line);
        String[] tags = tagger.tag(whitespaceTokenizerLine);

        POSSample sample = new POSSample(whitespaceTokenizerLine, tags);


The output is:

Loading POS Tagger model ... done (2.045s)
Can_MD anyone_NN help_VB me_PRP dig_VB through_IN OpenNLP's_NNP horrible_JJ documentation?_NN

Average: 76.9 sent/s 
Total: 1 sent
Runtime: 0.013s

This is basically working from the POSTaggerTool class included as part of OpenNLP. The sample.getTags() is a String array that has the tag types themselves.

This requires direct file access to the training data, which is really, really lame.

An updated codebase for this is a little different (and probably more useful.)

First, a Maven POM:

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">


And here's the code, written as a test, therefore located in ./src/test/java/org/javachannel/opennlp/example:

package org.javachannel.opennlp.example;

import opennlp.tools.cmdline.PerformanceMonitor;
import opennlp.tools.postag.POSModel;
import opennlp.tools.postag.POSSample;
import opennlp.tools.postag.POSTaggerME;
import opennlp.tools.tokenize.WhitespaceTokenizer;
import org.testng.annotations.DataProvider;
import org.testng.annotations.Test;

import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.net.URL;
import java.nio.channels.Channels;
import java.nio.channels.ReadableByteChannel;
import java.util.stream.Stream;

public class POSTest {
    private void download(String url, File destination) throws IOException {
        URL website = new URL(url);
        ReadableByteChannel rbc = Channels.newChannel(website.openStream());
        FileOutputStream fos = new FileOutputStream(destination);
        fos.getChannel().transferFrom(rbc, 0, Long.MAX_VALUE);

    Object[][] getCorpusData() {
        return new Object[][][]{{{
                "Can anyone help me dig through OpenNLP's horrible documentation?"

    @Test(dataProvider = "getCorpusData")
    public void showPOS(Object[] input) throws IOException {
        File modelFile = new File("en-pos-maxent.bin");
        if (!modelFile.exists()) {
            System.out.println("Downloading model.");
            download("http://opennlp.sourceforge.net/models-1.5/en-pos-maxent.bin", modelFile);
        POSModel model = new POSModel(modelFile);
        PerformanceMonitor perfMon = new PerformanceMonitor(System.err, "sent");
        POSTaggerME tagger = new POSTaggerME(model);

        Stream.of(input).map(line -> {
            String whitespaceTokenizerLine[] = WhitespaceTokenizer.INSTANCE.tokenize(line.toString());
            String[] tags = tagger.tag(whitespaceTokenizerLine);

            POSSample sample = new POSSample(whitespaceTokenizerLine, tags);

            return sample.toString();

This code doesn't actually test anything - it's a smoke test, if anything - but it should serve as a starting point. Another (potentially) nice thing is that it downloads a model for you if you don't have it downloaded already.

like image 132
Joseph Ottinger Avatar answered Sep 21 '22 13:09

Joseph Ottinger