Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to split a string based on punctuation marks and whitespace?

Tags:

java

regex

I have a String that I want to split based on punctuation marks and whitespace. What should be the regex argument to the split() method?

like image 724
andandandand Avatar asked Apr 08 '11 22:04

andandandand


1 Answers

Code with some weirdness-handling thrown in: (Notice that it skips empty tokens in the output loop. That's quick and dirty.) You can add whatever characters you need split and removed to the regex pattern. (tchrist is right. The \s thing is woefully implemented and only works in some very simple cases.)

public class SomeClass {
    public static void main(String args[]) {
        String input = "The\rquick!brown  - fox\t\tjumped?over;the,lazy\n,,..  \nsleeping___dog.";

        for (String s: input.split("[\\p{P} \\t\\n\\r]")){
            if (s.equals("")) continue;
            System.out.println(s);
        }
    }
}


INPUT:

The
quick!brown  - fox      jumped?over;the,lazy
,,..  
sleeping___dog.

OUTPUT:

The
quick
brown
fox
jumped
over
the
lazy
sleeping
dog
like image 81
Paul Sasik Avatar answered Sep 20 '22 13:09

Paul Sasik