I need to split text into sentences. I'm currently playing around with OpenNLP's sentence detector tool. I've also heard of NLTK and Stanford CoreNLP tools. What is the most accurate English sentence detection tools out there? I don't need too many NLP features--only a good tool for sentence splitting/detection.
I've also heard about Lucene...but that may be too much. But if it has a kick-ass sentence detection module, then I'll use it.
For splitting sentences first mark the clauses. Then make sub-clauses independent by omitting subordinating linkers and inserting subjects or other words wherever necessary. Example – When I went to Delhi I met my friend who lives there.
Sentence splitting is the process of separating free-flowing text into sentences. It is one of the first steps in any natural language processing (NLP) application, which includes the AI-driven Scribendi Accelerator.
NLTK includes an implementation of the Punkt tokenizer described in this paper. I don't know if it's the absolute best around but it's very very good, it's lightweight and easy to use, and it's free.
check lingpipe implementation http://alias-i.com/lingpipe/docs/api/com/aliasi/sentences/IndoEuropeanSentenceModel.html
Their model quite powerful, and easy to implement - check few pre/post rules(aka regexps) at any possible sentence split and thats all. I found it working better then one in GATE and OpenNLP.
There are another open source project which support this heuristic model as example, http://code.google.com/p/graph-expression/wiki/SentenceSplitting
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With