I have the following string "3/4Ton". I want to split it as -->
word[1] = 3/4 and word[2] = Ton.
Right now my piece of code looks like this:-
Pattern p = Pattern.compile("[A-Z]{1}[a-z]+");
Matcher m = p.matcher(line);
while(m.find()){
System.out.println("The word --> "+m.group());
}
It carries out the needed task of splitting the string based on capital letters like:-
String = MachineryInput
word[1] = Machinery , word[2] = Input
The only problem is it does not preserve, numbers or abbreviations or sequences of capital letters which are not meant to be separate words. Could some one help me out with my regular expression coding problem.
Thanks in advance...
To split a string on capital letters, call the split() method with the following regular expression - /(? =[A-Z])/ . The regular expression uses a positive lookahead assertion to split the string on each capital letter and returns an array of the substrings. Copied!
To split a string by special characters, call the split() method on the string, passing it a regular expression that matches any of the special characters as a parameter. The method will split the string on each occurrence of a special character and return an array containing the results.
Answer: There are isUpperCase() and isLowerCase() methods available in String class to check the upper case and lower case characters respectively.
You can actually do this in regex alone using look ahead and look behind (see special constructs on this page: http://java.sun.com/javase/6/docs/api/java/util/regex/Pattern.html )
/**
* We'll use this pattern as divider to split the string into an array.
* Usage: myString.split(DIVIDER_PATTERN);
*/
private static final String DIVIDER_PATTERN =
"(?<=[^\\p{Lu}])(?=\\p{Lu})"
// either there is anything that is not an uppercase character
// followed by an uppercase character
+ "|(?<=[\\p{Ll}])(?=\\d)"
// or there is a lowercase character followed by a digit
;
@Test
public void testStringSplitting() {
assertEquals(2, "3/4Word".split(DIVIDER_PATTERN).length);
assertEquals(7, "ManyManyWordsInThisBigThing".split(DIVIDER_PATTERN).length);
assertEquals(7, "This123/4Mixed567ThingIsDifficult"
.split(DIVIDER_PATTERN).length);
}
So what you can do is something like this:
for(String word: myString.split(DIVIDER_PATTERN)){
System.out.println(word);
}
Sean
Using regex would be nice here. I bet there's a way to do it too, although I'm not a swing-in-on-a-vine regex guy so I can't help you. However, there's something you can't avoid - something, somewhere needs to loop over your String eventually. You could do this "on your own" like so:
String[] splitOnCapitals(String str) {
ArrayList<String> array = new ArrayList<String>();
StringBuilder builder = new StringBuilder();
int min = 0;
int max = 0;
for(int i = 0; i < str.length(); i++) {
if(Character.isUpperCase(str.charAt(i))) {
String line = builder.toString().trim();
if (line.length() > 0) array.add(line);
builder = new StringBuilder();
}
builder.append(str.charAt(i));
}
array.add(builder.toString().trim()); // get the last little bit too
return array.toArray(new String[0]);
}
I tested it with the following test driver:
public static void main(String[] args) {
String test = "3/4 Ton truCk";
String[] arr = splitOnCapitals(test);
for(String s : arr) System.out.println(s);
test = "Start with Capital";
arr = splitOnCapitals(test);
for(String s : arr) System.out.println(s);
}
And got the following output:
3/4
Ton tru
Ck
Start with
Capital
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With