Split a string with multiple delimiters using only String methods

Tags:

I ripped of another Stack Overflow question - Equivalent to StringTokenizer with multiple characters delimiters, but I want to know if this can be done with only string methods (.equals(), .startsWith(), etc.). I don't want to use RegEx's, the StringTokenizer class, Patterns, Matchers or anything other than String for that matter.

For example, this is how I want to call the method

String[] delimiters = {" ", "==", "=", "+", "+=", "++", "-", "-=", "--", "/", "/=", "*", "*=", "(", ")", ";", "/**", "*/", "\t", "\n"};
        String splitString[] = tokenizer(contents, delimiters);

And this is the code I ripped of the other question (I don't want to do this).

    private String[] tokenizer(String string, String[] delimiters) {
        // First, create a regular expression that matches the union of the
        // delimiters
        // Be aware that, in case of delimiters containing others (example &&
        // and &),
        // the longer may be before the shorter (&& should be before &) or the
        // regexpr
        // parser will recognize && as two &.
        Arrays.sort(delimiters, new Comparator<String>() {
            @Override
            public int compare(String o1, String o2) {
                return -o1.compareTo(o2);
            }
        });
        // Build a string that will contain the regular expression
        StringBuilder regexpr = new StringBuilder();
        regexpr.append('(');
        for (String delim : delimiters) { // For each delimiter
            if (regexpr.length() != 1)
                regexpr.append('|'); // Add union separator if needed
            for (int i = 0; i < delim.length(); i++) {
                // Add an escape character if the character is a regexp reserved
                // char
                regexpr.append('\\');
                regexpr.append(delim.charAt(i));
            }
        }
        regexpr.append(')'); // Close the union
        Pattern p = Pattern.compile(regexpr.toString());

        // Now, search for the tokens
        List<String> res = new ArrayList<String>();
        Matcher m = p.matcher(string);
        int pos = 0;
        while (m.find()) { // While there's a delimiter in the string
            if (pos != m.start()) {
                // If there's something between the current and the previous
                // delimiter
                // Add it to the tokens list
                res.add(string.substring(pos, m.start()));
            }
            res.add(m.group()); // add the delimiter
            pos = m.end(); // Remember end of delimiter
        }
        if (pos != string.length()) {
            // If it remains some characters in the string after last delimiter
            // Add this to the token list
            res.add(string.substring(pos));
        }
        // Return the result
        return res.toArray(new String[res.size()]);
    }
    public static String[] clean(final String[] v) {
        List<String> list = new ArrayList<String>(Arrays.asList(v));
        list.removeAll(Collections.singleton(" "));
        return list.toArray(new String[list.size()]);
    }

Edit: I ONLY want to use string methods charAt, equals, equalsIgnoreCase, indexOf, length, and substring

360

asked Oct 31 '15 17:10

Aditya Ramkumar

2 Answers

EDIT: My original answer did not quite do the trick, it did not include the delimiters in the resultant array, and used the String.split() method, which was not allowed.

Here's my new solution, which is split into 2 methods:

/**
 * Splits the string at all specified literal delimiters, and includes the delimiters in the resulting array
 */
private static String[] tokenizer(String subject, String[] delimiters)  { 

    //Sort delimiters into length order, starting with longest
    Arrays.sort(delimiters, new Comparator<String>() {
        @Override
        public int compare(String s1, String s2) {
          return s2.length()-s1.length();
         }
      });

    //start with a list with only one string - the whole thing
    List<String> tokens = new ArrayList<String>();
    tokens.add(subject);

    //loop through the delimiters, splitting on each one
    for (int i=0; i<delimiters.length; i++) {
        tokens = splitStrings(tokens, delimiters, i);
    }

    return tokens.toArray(new String[] {});
}

/**
 * Splits each String in the subject at the delimiter
 */
private static List<String> splitStrings(List<String> subject, String[] delimiters, int delimiterIndex) {

    List<String> result = new ArrayList<String>();
    String delimiter = delimiters[delimiterIndex];

    //for each input string
    for (String part : subject) {

        int start = 0;

        //if this part equals one of the delimiters, don't split it up any more
        boolean alreadySplit = false;
        for (String testDelimiter : delimiters) {
            if (testDelimiter.equals(part)) {
                alreadySplit = true;
                break;
            }
        }

        if (!alreadySplit) {
            for (int index=0; index<part.length(); index++) {
                String subPart = part.substring(index);
                if (subPart.indexOf(delimiter)==0) {
                    result.add(part.substring(start, index));   // part before delimiter
                    result.add(delimiter);                      // delimiter
                    start = index+delimiter.length();           // next parts starts after delimiter
                }
            }
        }
        result.add(part.substring(start));                      // rest of string after last delimiter          
    }
    return result;
}

Original Answer

I notice you are using Pattern when you said you only wanted to use String methods.

The approach I would take would be to think of the simplest way possible. I think that is to first replace all the possible delimiters with just one delimiter, and then do the split.

Here's the code:

private String[] tokenizer(String string, String[] delimiters)  {       

    //replace all specified delimiters with one
    for (String delimiter : delimiters) {
        while (string.indexOf(delimiter)!=-1) {
            string = string.replace(delimiter, "{split}");
        }
    }

    //now split at the new delimiter
    return string.split("\\{split\\}");

}

I need to use String.replace() and not String.replaceAll() because replace() takes literal text and replaceAll() takes a regex argument, and the delimiters supplied are of literal text.

That's why I also need a while loop to replace all instances of each delimiter.

116

answered Oct 21 '22 04:10

NickJ

Using only non-regex String methods... I used the startsWith(...) method, which wasn't in the exclusive list of methods that you listed because it does simply string comparison rather than a regex comparison.

The following impl:

public static void main(String ... params) {
    String haystack = "abcdefghijklmnopqrstuvwxyz";
    String [] needles = new String [] { "def", "tuv" };
    String [] tokens = splitIntoTokensUsingNeedlesFoundInHaystack(haystack, needles);
    for (String string : tokens) {
        System.out.println(string);
    }
}

private static String[] splitIntoTokensUsingNeedlesFoundInHaystack(String haystack, String[] needles) {
    List<String> list = new LinkedList<String>();
    StringBuilder builder = new StringBuilder();
    for(int haystackIndex = 0; haystackIndex < haystack.length(); haystackIndex++) {
        boolean foundAnyNeedle = false;
        String substring = haystack.substring(haystackIndex);
        for(int needleIndex = 0; (!foundAnyNeedle) && needleIndex < needles.length; needleIndex ++) {
            String needle = needles[needleIndex];
            if(substring.startsWith(needle)) {
                if(builder.length() > 0) {
                    list.add(builder.toString());
                    builder = new StringBuilder();
                }
                foundAnyNeedle = true;
                list.add(needle);
                haystackIndex += (needle.length() - 1);
            }
        }
        if( ! foundAnyNeedle) {
            builder.append(substring.charAt(0));
        }
    }
    if(builder.length() > 0) {
        list.add(builder.toString());
    }
    return list.toArray(new String[]{});
}

outputs

abc
def
ghijklmnopqrs
tuv
wxyz

Note... This code is demo-only. In the event that one of the delimiters is any empty String, it will behave poorly and eventually crash with OutOfMemoryError: Java heap space after consuming a lot of CPU.

answered Oct 21 '22 02:10

Nathan

Related questions
                            
                                Are compiled Java 8 lambda expressions backwards compatible with earlier versions of the Java runtime?
                            
                                App crashing when I run Proguard on GSON (which using enum)
                            
                                How to query running instances of a process definition?
                            
                                Java 8 Date/Time (JSR-310) types mapping with Spring Data MongoDB
                            
                                Can IntelliJ generate getters without the "get" prefix?
                            
                                IntelliJ IDEA suggests replacing for loops with foreach method. Should I always do that when possible?
                            
                                Getting incorrect time leading by 1 hour with Europe/Moscow timezone
                            
                                Java HashMap add new entry while iterating
                            
                                future.cancel does not work
                            
                                Space is not allowed after parameter prefix ':'
                            
                                Matching Excel's floating point in Java
                            
                                Java- passing Lists to methods working as pass by reference
                            
                                Having problems using myString.split("\n");
                            
                                Pushing variables to Stack and Variables living in the Stack difference?
                            
                                error: No validator could be found for type: java.time.LocalDate
                            
                                Android volley Timeout Exception when using RequestFuture.get()
                            
                                Difference between getCanonicalPath and toRealPath
                            
                                Can we call a "case" inside another case in the same switch statement in Java?
                            
                                When we override the toString() method we should always return a string representation of the object?
                            
                                Check if a List is a value in a HashMap

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Split a string with multiple delimiters using only String methods

Tags:

java

tokenize

Aditya Ramkumar

People also ask

2 Answers

NickJ

Nathan

Recent Activity

Donate For Us