Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the most accurate open-source tool for sentence splitting? [closed]

I need to split text into sentences. I'm currently playing around with OpenNLP's sentence detector tool. I've also heard of NLTK and Stanford CoreNLP tools. What is the most accurate English sentence detection tools out there? I don't need too many NLP features--only a good tool for sentence splitting/detection.

I've also heard about Lucene...but that may be too much. But if it has a kick-ass sentence detection module, then I'll use it.

like image 769
samxli Avatar asked Mar 14 '11 16:03

samxli


People also ask

How do you split up two sentences?

For splitting sentences first mark the clauses. Then make sub-clauses independent by omitting subordinating linkers and inserting subjects or other words wherever necessary. Example – When I went to Delhi I met my friend who lives there.

What is sentence splitter in NLP?

Sentence splitting is the process of separating free-flowing text into sentences. It is one of the first steps in any natural language processing (NLP) application, which includes the AI-driven Scribendi Accelerator.


2 Answers

NLTK includes an implementation of the Punkt tokenizer described in this paper. I don't know if it's the absolute best around but it's very very good, it's lightweight and easy to use, and it's free.

like image 136
rmalouf Avatar answered Sep 23 '22 21:09

rmalouf


check lingpipe implementation http://alias-i.com/lingpipe/docs/api/com/aliasi/sentences/IndoEuropeanSentenceModel.html

Their model quite powerful, and easy to implement - check few pre/post rules(aka regexps) at any possible sentence split and thats all. I found it working better then one in GATE and OpenNLP.

There are another open source project which support this heuristic model as example, http://code.google.com/p/graph-expression/wiki/SentenceSplitting

like image 40
yura Avatar answered Sep 25 '22 21:09

yura