Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex String.split( )

Tags:

java

string

regex

I'm using java to split a String of the form:

String stringToSplit = "AAA BBB CCC DDD EEE FFF GGG HHH III JJJ KKK";

I'm using

String[] tokens = stringToParse.split("\\s");

to split the strings along whitespace, giving:

tokens = {"AAA","BBB","CCC", "DDD","EEE","FFF","GGG","HHH","III", "JJJ", "KKK"} 

What I need to do now is split along whitespace for most of them, but also keep some strings together in specific cases. For instance, I want "CCC DDD" and "III JJJ KKK" to stay as their full strings when I split. So I want my array of tokens to be:

tokens = {"AAA","BBB","CCC DDD","EEE","FFF","GGG","HHH","III JJJ KKK"}

What regex would I use? Is this possible?

like image 690
mainstringargs Avatar asked Apr 21 '26 11:04

mainstringargs


2 Answers

You could replace ccc ddd with ccc_ddd and then covert the underscore back to a space later.

You might want to invest in some kind of syntax parser if you're going to be doing lots of this kind of thing

like image 168
Will Avatar answered Apr 23 '26 01:04

Will


Instead of using split(), you could use the following method where you find all consecutive non-whitespace characters, but use alternation to also match your specific target strings that contain whitespace:

Pattern p = Pattern.compile("CCC DDD|III JJJ KKK|\\S+");
Matcher m = p.matcher("AAA BBB CCC DDD EEE FFF GGG HHH III JJJ KKK");
while(m.find()) {
    System.out.println(m.group());
}

Example: http://ideone.com/AxI1CV

like image 41
Andrew Clark Avatar answered Apr 22 '26 23:04

Andrew Clark



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!