Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to trim a java stringbuilder?

I have a StringBuilder object that needs to be trimmed (i.e. all whitespace chars /u0020 and below removed from either end).

I can't seem to find a method in string builder that would do this.

Here's what I'm doing now:

String trimmedStr = strBuilder.toString().trim();

This gives exactly the desired output, but it requires two Strings to be allocated instead of one. Is there a more efficient to trim the string while it's still in the StringBuilder?

like image 261
CodeFusionMobile Avatar asked Mar 06 '11 19:03

CodeFusionMobile


People also ask

How do I remove the end of a StringBuilder?

The idea is to use the deleteCharAt() method of StringBuilder class to remove first and the last character of a string. The deleteCharAt() method accepts a parameter as an index of the character you want to remove.

How do I remove spaces from a string builder?

The trim() method removes all white space at the end of the string. That's correct.

How do you trim something in Java?

The trim() method in Java String is a built-in function that eliminates leading and trailing spaces. The Unicode value of space character is '\u0020'. The trim() method in java checks this Unicode value before and after the string, if it exists then removes the spaces and returns the omitted string.


2 Answers

You should not use the deleteCharAt approach.

As Boris pointed out, the deleteCharAt method copies the array over every time. The code in the Java 5 that does this looks like this:

public AbstractStringBuilder deleteCharAt(int index) {
    if ((index < 0) || (index >= count))
        throw new StringIndexOutOfBoundsException(index);
    System.arraycopy(value, index+1, value, index, count-index-1);
    count--;
    return this;
}

Of course, speculation alone is not enough to choose one method of optimization over another, so I decided to time the 3 approaches in this thread: the original, the delete approach, and the substring approach.

Here is the code I tested for the orignal:

public static String trimOriginal(StringBuilder sb) {
    return sb.toString().trim();
}

The delete approach:

public static String trimDelete(StringBuilder sb) {
    while (sb.length() > 0 && Character.isWhitespace(sb.charAt(0))) {
        sb.deleteCharAt(0);
    }
    while (sb.length() > 0 && Character.isWhitespace(sb.charAt(sb.length() - 1))) {
        sb.deleteCharAt(sb.length() - 1);
    }
    return sb.toString();
}

And the substring approach:

public static String trimSubstring(StringBuilder sb) {
    int first, last;

    for (first=0; first<sb.length(); first++)
        if (!Character.isWhitespace(sb.charAt(first)))
            break;

    for (last=sb.length(); last>first; last--)
        if (!Character.isWhitespace(sb.charAt(last-1)))
            break;

    return sb.substring(first, last);
}

I performed 100 tests, each time generating a million-character StringBuffer with ten thousand trailing and leading spaces. The testing itself is very basic, but it gives a good idea of how long the methods take.

Here is the code to time the 3 approaches:

public static void main(String[] args) {

    long originalTime = 0;
    long deleteTime = 0;
    long substringTime = 0;

    for (int i=0; i<100; i++) {

        StringBuilder sb1 = new StringBuilder();
        StringBuilder sb2 = new StringBuilder();
        StringBuilder sb3 = new StringBuilder();

        for (int j=0; j<10000; j++) {
            sb1.append(" ");
            sb2.append(" ");
            sb3.append(" ");
        }
        for (int j=0; j<980000; j++) {
            sb1.append("a");
            sb2.append("a");
            sb3.append("a");
        }
        for (int j=0; j<10000; j++) {
            sb1.append(" ");
            sb2.append(" ");
            sb3.append(" ");
        }

        long timer1 = System.currentTimeMillis();
        trimOriginal(sb1);
        originalTime += System.currentTimeMillis() - timer1;

        long timer2 = System.currentTimeMillis();
        trimDelete(sb2);
        deleteTime += System.currentTimeMillis() - timer2;

        long timer3 = System.currentTimeMillis();
        trimSubstring(sb3);
        substringTime += System.currentTimeMillis() - timer3;
    }

    System.out.println("original:  " + originalTime + " ms");
    System.out.println("delete:    " + deleteTime + " ms");
    System.out.println("substring: " + substringTime + " ms");
}

I got the following output:

original:  176 ms
delete:    179242 ms
substring: 154 ms

As we see, the substring approach provides a very slight optimization over the original "two String" approach. However, the delete approach is extremely slow and should be avoided.

So to answer your question: you are fine trimming your StringBuilder the way you suggested in the question. The very slight optimization that the substring method offers probably does not justify the excess code.

like image 151
Zaven Nahapetyan Avatar answered Nov 16 '22 02:11

Zaven Nahapetyan


I've used Zaven's analysis approach and StringBuilder's delete(start, end) method which performs far better than the deleteCharAt(index) approach, but slightly worse than the substring() approach. This method also uses the array copy, but array copy is called far fewer times (only twice in the worst case). In addition, this avoids creating multiple instances of intermediate Strings in case trim() is called repeatedly on the same StringBuilder object.

public class Main {

    public static String trimOriginal(StringBuilder sb) {
        return sb.toString().trim();
    }

    public static String trimDeleteRange(StringBuilder sb) {
        int first, last;

        for (first = 0; first < sb.length(); first++)
            if (!Character.isWhitespace(sb.charAt(first)))
                break;

        for (last = sb.length(); last > first; last--)
            if (!Character.isWhitespace(sb.charAt(last - 1)))
                break;

        if (first == last) {
            sb.delete(0, sb.length());
        } else {
           if (last < sb.length()) {
              sb.delete(last, sb.length());
           }
           if (first > 0) {
              sb.delete(0, first);
           }
        }
        return sb.toString();
    }


    public static String trimSubstring(StringBuilder sb) {
        int first, last;

        for (first = 0; first < sb.length(); first++)
            if (!Character.isWhitespace(sb.charAt(first)))
                break;

        for (last = sb.length(); last > first; last--)
            if (!Character.isWhitespace(sb.charAt(last - 1)))
                break;

        return sb.substring(first, last);
    }

    public static void main(String[] args) {
        runAnalysis(1000);
        runAnalysis(10000);
        runAnalysis(100000);
        runAnalysis(200000);
        runAnalysis(500000);
        runAnalysis(1000000);
    }

    private static void runAnalysis(int stringLength) {
        System.out.println("Main:runAnalysis(string-length=" + stringLength + ")");

        long originalTime = 0;
        long deleteTime = 0;
        long substringTime = 0;

        for (int i = 0; i < 200; i++) {

            StringBuilder temp = new StringBuilder();
            char[] options = {' ', ' ', ' ', ' ', 'a', 'b', 'c', 'd'};
            for (int j = 0; j < stringLength; j++) {
                temp.append(options[(int) ((Math.random() * 1000)) % options.length]);
            }
            String testStr = temp.toString();

            StringBuilder sb1 = new StringBuilder(testStr);
            StringBuilder sb2 = new StringBuilder(testStr);
            StringBuilder sb3 = new StringBuilder(testStr);

            long timer1 = System.currentTimeMillis();
            trimOriginal(sb1);
            originalTime += System.currentTimeMillis() - timer1;

            long timer2 = System.currentTimeMillis();
            trimDeleteRange(sb2);
            deleteTime += System.currentTimeMillis() - timer2;

            long timer3 = System.currentTimeMillis();
            trimSubstring(sb3);
            substringTime += System.currentTimeMillis() - timer3;
        }

        System.out.println("  original:     " + originalTime + " ms");
        System.out.println("  delete-range: " + deleteTime + " ms");
        System.out.println("  substring:    " + substringTime + " ms");
    }

}

Output:

Main:runAnalysis(string-length=1000)
  original:     0 ms
  delete-range: 4 ms
  substring:    0 ms
Main:runAnalysis(string-length=10000)
  original:     4 ms
  delete-range: 9 ms
  substring:    4 ms
Main:runAnalysis(string-length=100000)
  original:     22 ms
  delete-range: 33 ms
  substring:    43 ms
Main:runAnalysis(string-length=200000)
  original:     57 ms
  delete-range: 93 ms
  substring:    110 ms
Main:runAnalysis(string-length=500000)
  original:     266 ms
  delete-range: 220 ms
  substring:    191 ms
Main:runAnalysis(string-length=1000000)
  original:     479 ms
  delete-range: 467 ms
  substring:    426 ms
like image 20
shams Avatar answered Nov 16 '22 01:11

shams