Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to split a string between letters and digits (or between digits and letters)?

Tags:

java

string

regex

You could try to split on (?<=\D)(?=\d)|(?<=\d)(?=\D), like:

str.split("(?<=\\D)(?=\\d)|(?<=\\d)(?=\\D)");

It matches positions between a number and not-a-number (in any order).

  • (?<=\D)(?=\d) - matches a position between a non-digit (\D) and a digit (\d)
  • (?<=\d)(?=\D) - matches a position between a digit and a non-digit.

How about:

private List<String> Parse(String str) {
    List<String> output = new ArrayList<String>();
    Matcher match = Pattern.compile("[0-9]+|[a-z]+|[A-Z]+").matcher(str);
    while (match.find()) {
        output.add(match.group());
    }
    return output;
}

You can try this:

Pattern p = Pattern.compile("[a-z]+|\\d+");
Matcher m = p.matcher("123abc345def");
ArrayList<String> allMatches = new ArrayList<>();
while (m.find()) {
    allMatches.add(m.group());
}

The result (allMatches) will be:

["123", "abc", "345", "def"]

Use two different patterns: [0-9]* and [a-zA-Z]* and split twice by each of them.


If you are looking for solution without using Java String functionality (i.e. split, match, etc.) then the following should help:

List<String> splitString(String string) {
        List<String> list = new ArrayList<String>();
        String token = "";
        char curr;
        for (int e = 0; e < string.length() + 1; e++) {
            if (e == 0)
                curr = string.charAt(0);
            else {
                curr = string.charAt(--e);
            }

            if (isNumber(curr)) {
                while (e < string.length() && isNumber(string.charAt(e))) {
                    token += string.charAt(e++);
                }
                list.add(token);
                token = "";
            } else {
                while (e < string.length() && !isNumber(string.charAt(e))) {
                    token += string.charAt(e++);
                }
                list.add(token);
                token = "";
            }

        }

        return list;
    }

boolean isNumber(char c) {
        return c >= '0' && c <= '9';
    }

This solution will split numbers and 'words', where 'words' are strings that don't contain numbers. However, if you like to have only 'words' containing English letters then you can easily modify it by adding more conditions (like isNumber method call) depending on your requirements (for example you may wish to skip words that contain non English letters). Also note that the splitString method returns ArrayList which later can be converted to String array.