Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

split a string based on pattern in java - capital letters and numbers

Tags:

java

string

regex

I have the following string "3/4Ton". I want to split it as -->

word[1] = 3/4 and word[2] = Ton.

Right now my piece of code looks like this:-

Pattern p = Pattern.compile("[A-Z]{1}[a-z]+");
Matcher m = p.matcher(line);
while(m.find()){
    System.out.println("The word --> "+m.group());
    }

It carries out the needed task of splitting the string based on capital letters like:-

String = MachineryInput

word[1] = Machinery , word[2] = Input

The only problem is it does not preserve, numbers or abbreviations or sequences of capital letters which are not meant to be separate words. Could some one help me out with my regular expression coding problem.

Thanks in advance...

like image 458
leba-lev Avatar asked May 17 '10 15:05

leba-lev


People also ask

How do you split a string at every capital letter?

To split a string on capital letters, call the split() method with the following regular expression - /(? =[A-Z])/ . The regular expression uses a positive lookahead assertion to split the string on each capital letter and returns an array of the substrings. Copied!

How can you split a character having the combination of string special characters and numbers?

To split a string by special characters, call the split() method on the string, passing it a regular expression that matches any of the special characters as a parameter. The method will split the string on each occurrence of a special character and return an array containing the results.

How do you separate lowercase and uppercase in Java?

Answer: There are isUpperCase() and isLowerCase() methods available in String class to check the upper case and lower case characters respectively.


2 Answers

You can actually do this in regex alone using look ahead and look behind (see special constructs on this page: http://java.sun.com/javase/6/docs/api/java/util/regex/Pattern.html )

/**
 * We'll use this pattern as divider to split the string into an array.
 * Usage: myString.split(DIVIDER_PATTERN);
 */
private static final String DIVIDER_PATTERN =

        "(?<=[^\\p{Lu}])(?=\\p{Lu})"
                // either there is anything that is not an uppercase character
                // followed by an uppercase character

                + "|(?<=[\\p{Ll}])(?=\\d)"
        // or there is a lowercase character followed by a digit

        ;

@Test
public void testStringSplitting() {
    assertEquals(2, "3/4Word".split(DIVIDER_PATTERN).length);
    assertEquals(7, "ManyManyWordsInThisBigThing".split(DIVIDER_PATTERN).length);
    assertEquals(7, "This123/4Mixed567ThingIsDifficult"
                        .split(DIVIDER_PATTERN).length);
}

So what you can do is something like this:

for(String word: myString.split(DIVIDER_PATTERN)){
    System.out.println(word);
}

Sean

like image 65
Sean Patrick Floyd Avatar answered Oct 28 '22 01:10

Sean Patrick Floyd


Using regex would be nice here. I bet there's a way to do it too, although I'm not a swing-in-on-a-vine regex guy so I can't help you. However, there's something you can't avoid - something, somewhere needs to loop over your String eventually. You could do this "on your own" like so:

String[] splitOnCapitals(String str) {
    ArrayList<String> array = new ArrayList<String>();
    StringBuilder builder = new StringBuilder();
    int min = 0;
    int max = 0;
    for(int i = 0; i < str.length(); i++) {
        if(Character.isUpperCase(str.charAt(i))) {
            String line = builder.toString().trim();
            if (line.length() > 0) array.add(line);
            builder = new StringBuilder();
        }
        builder.append(str.charAt(i));
    }
    array.add(builder.toString().trim()); // get the last little bit too
    return array.toArray(new String[0]);
}

I tested it with the following test driver:

public static void main(String[] args) {
    String test = "3/4 Ton truCk";
    String[] arr = splitOnCapitals(test);
    for(String s : arr) System.out.println(s);

    test = "Start with Capital";
    arr = splitOnCapitals(test);
    for(String s : arr) System.out.println(s);
}

And got the following output:

3/4
Ton tru
Ck
Start with
Capital
like image 23
corsiKa Avatar answered Oct 28 '22 01:10

corsiKa