Simple Natural Language Processing Startup for Java [duplicate]

Tags:

nlp

I am willing to start developing a project on NLP. I dont know much of the tools available. After googling for about a month. I realized that openNLP can be my solution.

Unfortunately i dont see any complete tutorial over using the API. All of them are lacking of some general steps. I need a tutorial from ground level. I have seen a lot of downloads over the site but dont know how to use them? do i need to train or something?.. Here is what i want to know-

How to install / set up a nlp system which can-

parse a English sentence words
identify the different parts of speech

507

asked Apr 29 '11 13:04

shababhsiddique

2 Answers

You say that you need to 'parse' each sentence. You probably already know this, but just to be explicit, in NLP, the term 'parse' usually means to recover some hierarchical syntactic structure. The most common types are constituent structure (e.g., via a context-free grammar) and dependency structure.

If you need hierarchical structure, I'd recommend you consider just starting with a parser. Most parsers I'm aware of include POS tagging during parsing, and may provide higher accuracy tagging than finite-state POS taggers (Caveat - I'm much more familiar with constituent parsers than with dependency parsers. It's possible some or most dependency parsers would require POS tags as input).

The big downside to parsing is the time complexity. Finite-state POS taggers often run at thousands of words per second. Even greedy dependency parsers are considerably slower, and constituent parsers generally run at 1-5 sentences per second. So if you don't need hierarchical structure, you probably want to stick with a finite-state POS tagger for efficiency.

If you do decide you need parse structure, a few recommendations:

I think the Stanford parser suggested by @aab includes both a constituent parser and a dependency parser.

The Berkeley Parser ( http://code.google.com/p/berkeleyparser/ ) is a pretty well-known PCFG constituent parser, achieves state-of-the-art accuracy (equal or superior to the Stanford parser, I believe), and is reasonably efficient (~3-5 sentences per second).

The BUBS Parser ( http://code.google.com/p/bubs-parser/ ) can also run with the high-accuracy Berkeley grammar, and improves efficiency to around 15-20 sentences/second. Full disclosure - I'm one of the primary researchers working on this parser.

Warning: both of these parsers are research code, with all the problems that engenders. But I'd love to see people actually using BUBS, so if it's of use to you, give it a try and contact me with problems, comments, suggestions, etc.

And a couple Wikipedia references for background if needed:

Context-free grammars: http://en.wikipedia.org/wiki/Stochastic_context-free_grammar
Dependency grammars: http://en.wikipedia.org/wiki/Dependency_grammar

answered Oct 16 '22 16:10

AaronD

Generally you'd do these two tasks in the other order:

Do part-of-speech tagging
Run a parser using the POS tags as input

OpenNLP's documentation isn't that thorough and some of it's gotten hard to find due to the switch to apache. Some (potentially slightly out-of-date) tutorials are available in the old SF wiki.

You might want to take a look at the Stanford NLP tools, in particular the Stanford POS Tagger and the Stanford Parser. Both have downloads that include pre-trained model files and they also have demo files in the top-level directory that show how to get started with the API and short shell scripts that show how to use the tools from the command-line.

LingPipe might be another good toolkit to check out. A quick search here will lead you to a number of similar questions with links to other alternatives, too!

answered Oct 16 '22 17:10

aab

Related questions
                            
                                Eclipse Command Line Java
                            
                                How do you think through and predict the output of a threading question like this?
                            
                                Generating UUID through Maven
                            
                                How can I write a Controller without making it a God object?
                            
                                How to convert an 18 digit numeric string to BigInteger?
                            
                                Create PrivateKey from byte array
                            
                                How to print Global errors only with form:errors?
                            
                                PCM Wave file - stereo to mono
                            
                                JPA query language criteriaBuilder
                            
                                Why should my local variables be final to be accessible from anonymous class? [duplicate]
                            
                                How to convert javascript regex to safe java regex?
                            
                                What does "Insert common prefixes automatically" do in Eclipse?
                            
                                Difference between ContextLoaderListener and ContextLoaderServlet
                            
                                Synchronized JList and JComboBox?
                            
                                How to escape user-supplied parameters with a SQL query?
                            
                                Java Circular Byte Buffer that Extends java.nio.ByteBuffer
                            
                                Get a dummy slf4j logger?
                            
                                Capturing standards out of a class run by mvn exec:java
                            
                                How to store an Http Response that may contain binary data?
                            
                                java Runtime process - check if waiting for input

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Simple Natural Language Processing Startup for Java [duplicate]

Tags:

java

nlp

shababhsiddique

People also ask

2 Answers

AaronD

aab

Recent Activity

Donate For Us