Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is String.strip() 5 times faster than String.trim() for blank string In Java 11

I've encountered an interesting scenario. For some reason strip() against blank string (contains whitespaces only) significantly faster than trim() in Java 11.

Benchmark

public class Test {

    public static final String TEST_STRING = "   "; // 3 whitespaces

    @Benchmark
    @Warmup(iterations = 10, time = 200, timeUnit = MILLISECONDS)
    @Measurement(iterations = 20, time = 500, timeUnit = MILLISECONDS)
    @BenchmarkMode(Mode.Throughput)
    public void testTrim() {
        TEST_STRING.trim();
    }

    @Benchmark
    @Warmup(iterations = 10, time = 200, timeUnit = MILLISECONDS)
    @Measurement(iterations = 20, time = 500, timeUnit = MILLISECONDS)
    @BenchmarkMode(Mode.Throughput)
    public void testStrip() {
        TEST_STRING.strip();
    }

    public static void main(String[] args) throws Exception {
        org.openjdk.jmh.Main.main(args);
    }
}

Results

# Run complete. Total time: 00:04:16

Benchmark        Mode  Cnt           Score          Error  Units
Test.testStrip  thrpt  200  2067457963.295 ± 12353310.918  ops/s
Test.testTrim   thrpt  200   402307182.894 ±  4559641.554  ops/s

Apparently strip() outperforms trim() ~5 times.

Although for non-blank string, results are almost identical:

public class Test {

    public static final String TEST_STRING = " Test String ";

    @Benchmark
    @Warmup(iterations = 10, time = 200, timeUnit = MILLISECONDS)
    @Measurement(iterations = 20, time = 500, timeUnit = MILLISECONDS)
    @BenchmarkMode(Mode.Throughput)
    public void testTrim() {
        TEST_STRING.trim();
    }

    @Benchmark
    @Warmup(iterations = 10, time = 200, timeUnit = MILLISECONDS)
    @Measurement(iterations = 20, time = 500, timeUnit = MILLISECONDS)
    @BenchmarkMode(Mode.Throughput)
    public void testStrip() {
        TEST_STRING.strip();
    }

    public static void main(String[] args) throws Exception {
        org.openjdk.jmh.Main.main(args);
    }
}


# Run complete. Total time: 00:04:16

Benchmark        Mode  Cnt          Score         Error  Units
Test.testStrip  thrpt  200  126939018.461 ± 1462665.695  ops/s
Test.testTrim   thrpt  200  141868439.680 ± 1243136.707  ops/s

How come? Is this a bug or am I doing it wrong?


Testing environment

  • CPU - Intel Xeon E3-1585L v5 @3.00 GHz
  • OS - Windows 7 SP 1 64-bit
  • JVM - Oracle JDK 11.0.1
  • Benchamrk - JMH v 1.19

Update

Added more performance tests for different Strings (empty, blank, etc).

Benchmark

@Warmup(iterations = 5, time = 1, timeUnit = SECONDS)
@Measurement(iterations = 5, time = 1, timeUnit = SECONDS)
@Fork(value = 3)
@BenchmarkMode(Mode.Throughput)
public class Test {

    private static final String BLANK = "";              // Blank
    private static final String EMPTY = "   ";           // 3 spaces
    private static final String ASCII = "   abc    ";    // ASCII characters only
    private static final String UNICODE = "   абв    ";  // Russian Characters

    private static final String BIG = EMPTY.concat("Test".repeat(100)).concat(EMPTY);

    @Benchmark
    public void blankTrim() {
        BLANK.trim();
    }

    @Benchmark
    public void blankStrip() {
        BLANK.strip();
    }

    @Benchmark
    public void emptyTrim() {
        EMPTY.trim();
    }

    @Benchmark
    public void emptyStrip() {
        EMPTY.strip();
    }

    @Benchmark
    public void asciiTrim() {
        ASCII.trim();
    }

    @Benchmark
    public void asciiStrip() {
        ASCII.strip();
    }

    @Benchmark
    public void unicodeTrim() {
        UNICODE.trim();
    }

    @Benchmark
    public void unicodeStrip() {
        UNICODE.strip();
    }

    @Benchmark
    public void bigTrim() {
        BIG.trim();
    }

    @Benchmark
    public void bigStrip() {
        BIG.strip();
    }

    public static void main(String[] args) throws Exception {
        org.openjdk.jmh.Main.main(args);
    }
}

Results

# Run complete. Total time: 00:05:23

Benchmark           Mode  Cnt           Score          Error  Units
Test.asciiStrip    thrpt   15   356846913.133 ±  4096617.178  ops/s
Test.asciiTrim     thrpt   15   371319467.629 ±  4396583.099  ops/s
Test.bigStrip      thrpt   15    29058105.304 ±  1909323.104  ops/s
Test.bigTrim       thrpt   15    28529199.298 ±  1794655.012  ops/s
Test.blankStrip    thrpt   15  1556405453.206 ± 67230630.036  ops/s
Test.blankTrim     thrpt   15  1587932109.069 ± 19457780.528  ops/s
Test.emptyStrip    thrpt   15  2126290275.733 ± 23402906.719  ops/s
Test.emptyTrim     thrpt   15   406354680.805 ± 14359067.902  ops/s
Test.unicodeStrip  thrpt   15    37320438.099 ±   399421.799  ops/s
Test.unicodeTrim   thrpt   15    88226653.577 ±  1628179.578  ops/s

Testing environment is the same.

Only one interesting finding. String which contains Unicode characters getting trim()'ed faster than strip()'ed

like image 887
Mikhail Kholodkov Avatar asked Dec 05 '18 20:12

Mikhail Kholodkov


People also ask

What is difference between strip and trim in Java?

The trim() method always allocates a new String object. strip() method optimizes stripping to an empty String by returning an interned String constant. The strip() method is the recommended way to remove whitespaces because it uses the Unicode standard.

What is difference between strip () and trim () in Python?

FAQs : Python trim string While strip function trims characters from both ends of the string, lstrip trims python strings only from the left end and rstrip trims python strings only from the right end.

What does Strip () do in Java?

strip() is an instance method that returns a string whose value is the string with all leading and trailing white spaces removed. This method was introduced in Java 11. If the string contains only white spaces, then applying this method will result in an empty string.

Which two of the following statements are correct about the trim and strip method in the string class?

Which two of the following statements are correct about the trim and strip methods in the String class? Options are : Both the trim and strip method remove leading and trailing spaces (Correct)


3 Answers

On OpenJDK 11.0.1 String.strip() (actually StringLatin1.strip()) optimizes stripping to an empty String by returning an interned String constant:

public static String strip(byte[] value) {
    int left = indexOfNonWhitespace(value);
    if (left == value.length) {
        return "";
    }

while String.trim() (actually StringLatin1.trim()) always allocates a new String object. In your example st = 3 and len = 3 so

return ((st > 0) || (len < value.length)) ?
        newString(value, st, len - st) : null;

will under the hood copy the array and creates a new String object

return new String(Arrays.copyOfRange(val, index, index + len),
                      LATIN1);

Making above assumption we can update the benchmark to compare against a non-empty String which shouldn't be affected by mentioned String.strip() optimization:

@Warmup(iterations = 10, time = 200, timeUnit = MILLISECONDS)
@Measurement(iterations = 20, time = 500, timeUnit = MILLISECONDS)
@BenchmarkMode(Mode.Throughput)
public class MyBenchmark {

  public static final String EMPTY_STRING = "   "; // 3 whitespaces
  public static final String NOT_EMPTY_STRING = "  a "; // 3 whitespaces with a in the middle

  @Benchmark
  public void testEmptyTrim() {
    EMPTY_STRING.trim();
  }

  @Benchmark
  public void testEmptyStrip() {
    EMPTY_STRING.strip();
  }

  @Benchmark
  public void testNotEmptyTrim() {
    NOT_EMPTY_STRING.trim();
  }

  @Benchmark
  public void testNotEmptyStrip() {
    NOT_EMPTY_STRING.strip();
  }

}

Running it shows no significant difference between strip() and trim() for a non-empty String. Oddly enough trimming to an empty String is still the slowest:

Benchmark                       Mode  Cnt           Score           Error  Units
MyBenchmark.testEmptyStrip     thrpt  100  1887848947.416 ± 257906287.634  ops/s
MyBenchmark.testEmptyTrim      thrpt  100   206638996.217 ±  57952310.906  ops/s
MyBenchmark.testNotEmptyStrip  thrpt  100   399701777.916 ±   2429785.818  ops/s
MyBenchmark.testNotEmptyTrim   thrpt  100   385144724.856 ±   3928016.232  ops/s
like image 136
Karol Dowbecki Avatar answered Oct 14 '22 20:10

Karol Dowbecki


After looking into the source code of OpenJDK, assuming the implementation of the Oracle version is similar, I would imagine the difference is explained by the facts that

  • strip will try to find the first non-whitespace character, and if none is found, simply returns ""
  • trim will always return a new String(...the substring...)

One could argue that strip is just a tiny bit more optimised than trim, at least in OpenJDK, because it dodges the creation of new object unless necessary.

(Note: I didn't take the trouble to check the unicode versions of these methods.)

like image 23
Sami Hult Avatar answered Oct 14 '22 21:10

Sami Hult


Yep. In Java 11 or earlier seems that .trim() is always creating a new String() but strip() is returning a cache String. You can test this simple code and prove it yourself.

public class JavaClass{
  public static void main(String[] args){
      //prints false
      System.out.println("     ".trim()=="");//CREATING A NEW STRING()
  }
}

vs

public class JavaClass{
  public static void main(String[] args){
      //prints true
      System.out.println("     ".strip()=="");//RETURNING CACHE ""
  }
}
like image 32
chiperortiz Avatar answered Oct 14 '22 20:10

chiperortiz