I'm using java to split a String of the form:
String stringToSplit = "AAA BBB CCC DDD EEE FFF GGG HHH III JJJ KKK";
I'm using
String[] tokens = stringToParse.split("\\s");
to split the strings along whitespace, giving:
tokens = {"AAA","BBB","CCC", "DDD","EEE","FFF","GGG","HHH","III", "JJJ", "KKK"}
What I need to do now is split along whitespace for most of them, but also keep some strings together in specific cases. For instance, I want "CCC DDD" and "III JJJ KKK" to stay as their full strings when I split. So I want my array of tokens to be:
tokens = {"AAA","BBB","CCC DDD","EEE","FFF","GGG","HHH","III JJJ KKK"}
What regex would I use? Is this possible?
You could replace ccc ddd with ccc_ddd and then covert the underscore back to a space later.
You might want to invest in some kind of syntax parser if you're going to be doing lots of this kind of thing
Instead of using split(), you could use the following method where you find all consecutive non-whitespace characters, but use alternation to also match your specific target strings that contain whitespace:
Pattern p = Pattern.compile("CCC DDD|III JJJ KKK|\\S+");
Matcher m = p.matcher("AAA BBB CCC DDD EEE FFF GGG HHH III JJJ KKK");
while(m.find()) {
System.out.println(m.group());
}
Example: http://ideone.com/AxI1CV
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With