Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regular Expression to find the end of sentences

Tags:

java

regex

I am making a regular expression to find the end of sentences in a text. Here for I assume that any sentence can end with either .!? Sometimes though people like two write !!!!!! at the and of their sentence. So I want to replace any repeating dots, exclamation marks or question marks. But I want to allow the use of '...'. How can I include this exception? Please advise, Thanks!

Pattern p = null;
    try {
    //([!?.] with optional spaces), followed by ([!?.] with optional spaces) repeated 1 or more times
        p = Pattern.compile("([!?.]\\s*)([!?.]\\s*)+");
    }
    catch (PatternSyntaxException pex) {
        pex.printStackTrace();
        System.exit(0);
    }

    //get the matcher
    Matcher m = p.matcher(this.sentence);
    int index = 0;
    while(m.find(index))
    {
        System.out.println(this.sentence);
        System.out.println(p.toString());
        String toReplace = sentence.substring(m.start(), m.end());
        toReplace = toReplace.replaceAll("\\.","\\\\.");
        toReplace =toReplace.replaceAll("\\?","\\\\?");
        String replacement = ""+sentence.charAt(m.start());
        this.sentence = this.sentence.replaceAll(toReplace, replacement);
        System.out.println("");
        index = m.end();
        System.out.println(this.sentence);
    }
like image 890
Rob Hufschmitt Avatar asked Nov 06 '22 01:11

Rob Hufschmitt


1 Answers

Disclaimer: my answer will be off topic (not using regular expressions).

If it's not too heavyweight, try using Apache OpenNLP. NLP means "natural language processing". Check documentation on detecting sentences.

The relevant bit of code is:

String sentences[] = sentenceDetector.sentDetect("  First sentence. Second sentence. ");

You'll get an array of two Strings. First one will be "First sentence.", second one will be "Second sentence.".

There's more code to be written before using aforementioned line of code, but you get the general idea.

like image 112
darioo Avatar answered Nov 09 '22 15:11

darioo