I've encountered an interesting scenario. For some reason strip()
against blank string (contains whitespaces only) significantly faster than trim()
in Java 11.
Benchmark
public class Test {
public static final String TEST_STRING = " "; // 3 whitespaces
@Benchmark
@Warmup(iterations = 10, time = 200, timeUnit = MILLISECONDS)
@Measurement(iterations = 20, time = 500, timeUnit = MILLISECONDS)
@BenchmarkMode(Mode.Throughput)
public void testTrim() {
TEST_STRING.trim();
}
@Benchmark
@Warmup(iterations = 10, time = 200, timeUnit = MILLISECONDS)
@Measurement(iterations = 20, time = 500, timeUnit = MILLISECONDS)
@BenchmarkMode(Mode.Throughput)
public void testStrip() {
TEST_STRING.strip();
}
public static void main(String[] args) throws Exception {
org.openjdk.jmh.Main.main(args);
}
}
Results
# Run complete. Total time: 00:04:16
Benchmark Mode Cnt Score Error Units
Test.testStrip thrpt 200 2067457963.295 ± 12353310.918 ops/s
Test.testTrim thrpt 200 402307182.894 ± 4559641.554 ops/s
Apparently strip()
outperforms trim()
~5 times.
Although for non-blank string, results are almost identical:
public class Test {
public static final String TEST_STRING = " Test String ";
@Benchmark
@Warmup(iterations = 10, time = 200, timeUnit = MILLISECONDS)
@Measurement(iterations = 20, time = 500, timeUnit = MILLISECONDS)
@BenchmarkMode(Mode.Throughput)
public void testTrim() {
TEST_STRING.trim();
}
@Benchmark
@Warmup(iterations = 10, time = 200, timeUnit = MILLISECONDS)
@Measurement(iterations = 20, time = 500, timeUnit = MILLISECONDS)
@BenchmarkMode(Mode.Throughput)
public void testStrip() {
TEST_STRING.strip();
}
public static void main(String[] args) throws Exception {
org.openjdk.jmh.Main.main(args);
}
}
# Run complete. Total time: 00:04:16
Benchmark Mode Cnt Score Error Units
Test.testStrip thrpt 200 126939018.461 ± 1462665.695 ops/s
Test.testTrim thrpt 200 141868439.680 ± 1243136.707 ops/s
How come? Is this a bug or am I doing it wrong?
Testing environment
Added more performance tests for different Strings (empty, blank, etc).
Benchmark
@Warmup(iterations = 5, time = 1, timeUnit = SECONDS)
@Measurement(iterations = 5, time = 1, timeUnit = SECONDS)
@Fork(value = 3)
@BenchmarkMode(Mode.Throughput)
public class Test {
private static final String BLANK = ""; // Blank
private static final String EMPTY = " "; // 3 spaces
private static final String ASCII = " abc "; // ASCII characters only
private static final String UNICODE = " абв "; // Russian Characters
private static final String BIG = EMPTY.concat("Test".repeat(100)).concat(EMPTY);
@Benchmark
public void blankTrim() {
BLANK.trim();
}
@Benchmark
public void blankStrip() {
BLANK.strip();
}
@Benchmark
public void emptyTrim() {
EMPTY.trim();
}
@Benchmark
public void emptyStrip() {
EMPTY.strip();
}
@Benchmark
public void asciiTrim() {
ASCII.trim();
}
@Benchmark
public void asciiStrip() {
ASCII.strip();
}
@Benchmark
public void unicodeTrim() {
UNICODE.trim();
}
@Benchmark
public void unicodeStrip() {
UNICODE.strip();
}
@Benchmark
public void bigTrim() {
BIG.trim();
}
@Benchmark
public void bigStrip() {
BIG.strip();
}
public static void main(String[] args) throws Exception {
org.openjdk.jmh.Main.main(args);
}
}
Results
# Run complete. Total time: 00:05:23
Benchmark Mode Cnt Score Error Units
Test.asciiStrip thrpt 15 356846913.133 ± 4096617.178 ops/s
Test.asciiTrim thrpt 15 371319467.629 ± 4396583.099 ops/s
Test.bigStrip thrpt 15 29058105.304 ± 1909323.104 ops/s
Test.bigTrim thrpt 15 28529199.298 ± 1794655.012 ops/s
Test.blankStrip thrpt 15 1556405453.206 ± 67230630.036 ops/s
Test.blankTrim thrpt 15 1587932109.069 ± 19457780.528 ops/s
Test.emptyStrip thrpt 15 2126290275.733 ± 23402906.719 ops/s
Test.emptyTrim thrpt 15 406354680.805 ± 14359067.902 ops/s
Test.unicodeStrip thrpt 15 37320438.099 ± 399421.799 ops/s
Test.unicodeTrim thrpt 15 88226653.577 ± 1628179.578 ops/s
Testing environment is the same.
Only one interesting finding. String which contains Unicode characters getting trim()
'ed faster than strip()
'ed
The trim() method always allocates a new String object. strip() method optimizes stripping to an empty String by returning an interned String constant. The strip() method is the recommended way to remove whitespaces because it uses the Unicode standard.
FAQs : Python trim string While strip function trims characters from both ends of the string, lstrip trims python strings only from the left end and rstrip trims python strings only from the right end.
strip() is an instance method that returns a string whose value is the string with all leading and trailing white spaces removed. This method was introduced in Java 11. If the string contains only white spaces, then applying this method will result in an empty string.
Which two of the following statements are correct about the trim and strip methods in the String class? Options are : Both the trim and strip method remove leading and trailing spaces (Correct)
On OpenJDK 11.0.1 String.strip()
(actually StringLatin1.strip()
) optimizes stripping to an empty String
by returning an interned String
constant:
public static String strip(byte[] value) {
int left = indexOfNonWhitespace(value);
if (left == value.length) {
return "";
}
while String.trim()
(actually StringLatin1.trim()
) always allocates a new String
object. In your example st = 3
and len = 3
so
return ((st > 0) || (len < value.length)) ?
newString(value, st, len - st) : null;
will under the hood copy the array and creates a new String
object
return new String(Arrays.copyOfRange(val, index, index + len),
LATIN1);
Making above assumption we can update the benchmark to compare against a non-empty String
which shouldn't be affected by mentioned String.strip()
optimization:
@Warmup(iterations = 10, time = 200, timeUnit = MILLISECONDS)
@Measurement(iterations = 20, time = 500, timeUnit = MILLISECONDS)
@BenchmarkMode(Mode.Throughput)
public class MyBenchmark {
public static final String EMPTY_STRING = " "; // 3 whitespaces
public static final String NOT_EMPTY_STRING = " a "; // 3 whitespaces with a in the middle
@Benchmark
public void testEmptyTrim() {
EMPTY_STRING.trim();
}
@Benchmark
public void testEmptyStrip() {
EMPTY_STRING.strip();
}
@Benchmark
public void testNotEmptyTrim() {
NOT_EMPTY_STRING.trim();
}
@Benchmark
public void testNotEmptyStrip() {
NOT_EMPTY_STRING.strip();
}
}
Running it shows no significant difference between strip()
and trim()
for a non-empty String
. Oddly enough trimming to an empty String
is still the slowest:
Benchmark Mode Cnt Score Error Units
MyBenchmark.testEmptyStrip thrpt 100 1887848947.416 ± 257906287.634 ops/s
MyBenchmark.testEmptyTrim thrpt 100 206638996.217 ± 57952310.906 ops/s
MyBenchmark.testNotEmptyStrip thrpt 100 399701777.916 ± 2429785.818 ops/s
MyBenchmark.testNotEmptyTrim thrpt 100 385144724.856 ± 3928016.232 ops/s
After looking into the source code of OpenJDK, assuming the implementation of the Oracle version is similar, I would imagine the difference is explained by the facts that
strip
will try to find the first non-whitespace character, and if none is found, simply returns ""
trim
will always return a new String(...the substring...)
One could argue that strip
is just a tiny bit more optimised than trim
, at least in OpenJDK, because it dodges the creation of new object unless necessary.
(Note: I didn't take the trouble to check the unicode versions of these methods.)
Yep. In Java 11 or earlier seems that .trim() is always creating a new String() but strip() is returning a cache String. You can test this simple code and prove it yourself.
public class JavaClass{
public static void main(String[] args){
//prints false
System.out.println(" ".trim()=="");//CREATING A NEW STRING()
}
}
vs
public class JavaClass{
public static void main(String[] args){
//prints true
System.out.println(" ".strip()=="");//RETURNING CACHE ""
}
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With