Why is <code>indexOf</code> significantly faster than <code>contains</code> if the latter is merely a wrapper of the first one? Code from the Java API: <pre class="prettyprint"><code>public boolean contains(CharSequence s) { return indexOf(s.toString()) > -1; } </code></pre> The chosen answer in this thread shows a short test, which shows the difference. The chosen answer in this thread states that the overhead of the additional method call does not matter. So, why the difference? Please read my edit: Almost everyone is saying the micro-benchmark is flawed. Strange thing is, it reflects exactly my use-case. Actually, I was not doubting that <code>indexOf</code> is faster than <code>contains</code> (for my use-case) in the first place, I just wanted to know why. My intention was never to write a benchmark! I was just searching for the most-efficient way to test if a string contains another one (for my application which has nothing to do with a benchmark but a 'real-life situation').

The <code>contains</code> method is implemented as: <pre class="prettyprint"><code>public boolean contains(CharSequence s) { return indexOf(s.toString()) > -1; } </code></pre> Which means that it may be slower if <code>CharSequence s</code> is not a <code>java.lang.String</code> because calling <code>s.toString()</code> will result in the allocation and initialization of a new string instance. If <code>s</code> is a string - then there should not be any measurable difference. PS: The test from here is flawed: https://stackoverflow.com/a/18340277/2588800 Java initially execute in "interpreted" mode, which is quite slow, and when it detects that a piece of codes gets executed a lot of times it compiles it to native code in order to speed it up (read about JIT compilation). As you can see <code>contains</code> internally calls <code>indexOf</code>, which means that <code>indexOf</code> eventually will get compiled to native. So when he tests the <code>indexOf</code> (notice that he tests it after <code>contains</code>) it might have already been compiled to native code. And that's the reason for the time difference. Try to reverse the order of that tests - first test <code>indexOf</code> and then <code>contains</code> and I bet you'll see just the opposite results. <h3>JMH to the rescue</h3> <pre class="prettyprint"><code>Benchmark Mode Cnt Score Error Units StringSearchBenchmark.testContains thrpt 500 22,071 ± 0,269 ops/us StringSearchBenchmark.testIndexOf thrpt 500 22,654 ± 0,233 ops/us </code></pre> As you can see the difference is negligible and might be casued by the additional method calls (<code>indexOf()</code> + <code>toString()</code>) and load on the system. Source-Code: <pre class="prettyprint"><code>@Fork(1) @State(Scope.Benchmark) @OutputTimeUnit(TimeUnit.MICROSECONDS) @Measurement(iterations = 500, time = 50, timeUnit = TimeUnit.MILLISECONDS) @Warmup(iterations = 10) @BenchmarkMode(Mode.Throughput) public class StringSearchBenchmark { private static final String STATE = "absdefghijklmnopqrstuvwxyzabsdefghijklmnopqrstuvwxyzabsdefghijklmnopqrstuvwxyzabsdefghijklmnopqrstuvwxyz"; private static final String SEARCH_TERM = "abcd"; @Benchmark public void testContains(Blackhole sink) { sink.consume(STATE.contains(SEARCH_TERM)); } @Benchmark public void testIndexOf(Blackhole sink) { sink.consume(STATE.indexOf(SEARCH_TERM)); } } </code></pre>

As others have said, the benchmark was heavily flawed - performance testing of Java code does not work like that - you must warm it up to ensure that all classes have been loaded and parsed, that all objects have been loaded into memory, and that any compiling down to native code, e.g. via HotSpot, has been done. A naïve benchmark where you just run the code once in the main method is not really going to fly. A much better choice is to use something like JMH. Given the following test: <pre class="prettyprint"><code>package com.stackoverflow.example; import org.openjdk.jmh.annotations.Benchmark; import org.openjdk.jmh.runner.Runner; import org.openjdk.jmh.runner.RunnerException; import org.openjdk.jmh.runner.options.Options; import org.openjdk.jmh.runner.options.OptionsBuilder; @OutputTimeUnit(TimeUnit.MICROSECONDS) @Measurement(time = 250, timeUnit = TimeUnit.MILLISECONDS) public class MyBenchmark { private static final String[] names = new String[]{"jack", "jackson", "jason", "jadifu"}; public static void main(String[] args) throws RunnerException { Options opt = new OptionsBuilder() .include(MyBenchmark.class.getSimpleName()) .forks(1) .build(); new Runner(opt).run(); } @Benchmark public void contains() { names[0].contains("ja"); } @Benchmark public void containsExplicit() { names[0].indexOf("ja".toString()); } @Benchmark public void indexOf() { names[0].indexOf("ja"); } @Benchmark public void matches() { names[0].matches(".*ja.*"); } } </code></pre> I get the following results: <pre class="prettyprint"><code>Benchmark Mode Cnt Score Error Units MyBenchmark.contains thrpt 20 219.770 ± 2.032 ops/us MyBenchmark.containsExplicit thrpt 20 1820.024 ± 20.583 ops/us MyBenchmark.indexOf thrpt 20 1828.234 ± 18.744 ops/us MyBenchmark.matches thrpt 20 3.933 ± 0.052 ops/us </code></pre> Now, that's fairly interesting, as it still suggests that <code>contains</code> is significantly slower than <code>indexOf</code>. However, if I change the test up, very slightly, to the following: <pre class="prettyprint"><code>@Benchmark public void contains() { assert names[0].contains("ja"); } @Benchmark public void containsExplicit() { assert names[0].indexOf("ja".toString()) == 0; } @Benchmark public void indexOf() { assert names[0].indexOf("ja") == 0; } @Benchmark public void matches() { assert names[0].matches(".*ja.*"); } </code></pre> I get the following results: <pre class="prettyprint"><code>Benchmark Mode Cnt Score Error Units MyBenchmark.contains thrpt 20 220.480 ± 1.266 ops/us MyBenchmark.containsExplicit thrpt 20 219.962 ± 2.329 ops/us MyBenchmark.indexOf thrpt 20 219.706 ± 2.401 ops/us MyBenchmark.matches thrpt 20 3.766 ± 0.026 ops/us </code></pre> In this, we're getting the same result for contains, but <code>indexOf</code> has slowed down to match <code>contains</code>. That's a very interesting result. Why is this happening? Probably due to HotSpot recognising that the result of the <code>indexOf</code> call is never inspected, and since it's taking a <code>final</code> class (<code>String</code>), HotSpot is likely able to guarantee that there are no side effects to the call. So if we're not looking at the result and there are no side effects to the call, why are we making it? HotSpot is able to realise that a method call is pointless, and remove it altogether, which could be what's happening here. It would certainly explain the order of magnitude difference. Why doesn't this work for <code>contains</code>, though? I can only assume that it's because <code>contains</code> accepts a <code>CharSequence</code>, not a <code>String</code>, which is an abstract class, and that's just enough to prevent HotSpot from optimising the method call away. This also indicates that micro-benchmarks are hard in Java - there is a lot going on beneath the surface to optimise your running code, and a few shortcuts can result in extremely inaccurate benchmarks.

String: Why is indexOf significantely faster than contains?

Tags:

java

performance

string

Why is indexOf significantly faster than contains if the latter is merely a wrapper of the first one?

Code from the Java API:

public boolean contains(CharSequence s) {
     return indexOf(s.toString()) > -1;
}

The chosen answer in this thread shows a short test, which shows the difference.

The chosen answer in this thread states that the overhead of the additional method call does not matter. So, why the difference?

Please read my edit: Almost everyone is saying the micro-benchmark is flawed. Strange thing is, it reflects exactly my use-case.

Actually, I was not doubting that indexOf is faster than contains (for my use-case) in the first place, I just wanted to know why.

My intention was never to write a benchmark! I was just searching for the most-efficient way to test if a string contains another one (for my application which has nothing to do with a benchmark but a 'real-life situation').

296

asked Aug 19 '16 09:08

Willi Mentzel

3 Answers

The contains method is implemented as:

public boolean contains(CharSequence s) {
    return indexOf(s.toString()) > -1;
}

Which means that it may be slower if CharSequence s is not a java.lang.String because calling s.toString() will result in the allocation and initialization of a new string instance. If s is a string - then there should not be any measurable difference.

PS: The test from here is flawed: https://stackoverflow.com/a/18340277/2588800 Java initially execute in "interpreted" mode, which is quite slow, and when it detects that a piece of codes gets executed a lot of times it compiles it to native code in order to speed it up (read about JIT compilation).

As you can see contains internally calls indexOf, which means that indexOf eventually will get compiled to native. So when he tests the indexOf (notice that he tests it after contains) it might have already been compiled to native code. And that's the reason for the time difference. Try to reverse the order of that tests - first test indexOf and then contains and I bet you'll see just the opposite results.

JMH to the rescue

Benchmark                            Mode  Cnt   Score   Error   Units
StringSearchBenchmark.testContains  thrpt  500  22,071 ± 0,269  ops/us
StringSearchBenchmark.testIndexOf   thrpt  500  22,654 ± 0,233  ops/us

As you can see the difference is negligible and might be casued by the additional method calls (indexOf() + toString()) and load on the system.

Source-Code:

@Fork(1)
@State(Scope.Benchmark)
@OutputTimeUnit(TimeUnit.MICROSECONDS)
@Measurement(iterations = 500, time = 50, timeUnit = TimeUnit.MILLISECONDS)
@Warmup(iterations = 10)
@BenchmarkMode(Mode.Throughput)
public class StringSearchBenchmark {
    private static final String STATE = "absdefghijklmnopqrstuvwxyzabsdefghijklmnopqrstuvwxyzabsdefghijklmnopqrstuvwxyzabsdefghijklmnopqrstuvwxyz";
    private static final String SEARCH_TERM = "abcd";

    @Benchmark
    public void testContains(Blackhole sink) {
        sink.consume(STATE.contains(SEARCH_TERM));
    }

    @Benchmark
    public void testIndexOf(Blackhole sink) {
        sink.consume(STATE.indexOf(SEARCH_TERM));
    }
}

132

answered Oct 12 '22 23:10

Svetlin Zarev

As others have said, the benchmark was heavily flawed - performance testing of Java code does not work like that - you must warm it up to ensure that all classes have been loaded and parsed, that all objects have been loaded into memory, and that any compiling down to native code, e.g. via HotSpot, has been done. A naïve benchmark where you just run the code once in the main method is not really going to fly. A much better choice is to use something like JMH. Given the following test:

package com.stackoverflow.example;

import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.runner.Runner;
import org.openjdk.jmh.runner.RunnerException;
import org.openjdk.jmh.runner.options.Options;
import org.openjdk.jmh.runner.options.OptionsBuilder;

@OutputTimeUnit(TimeUnit.MICROSECONDS)
@Measurement(time = 250, timeUnit = TimeUnit.MILLISECONDS)    
public class MyBenchmark {

    private static final String[] names = new String[]{"jack", "jackson", "jason", "jadifu"};

    public static void main(String[] args) throws RunnerException {
        Options opt = new OptionsBuilder()
                .include(MyBenchmark.class.getSimpleName())
                .forks(1)
                .build();

        new Runner(opt).run();
    }

    @Benchmark
    public void contains() {
        names[0].contains("ja");
    }

    @Benchmark
    public void containsExplicit() {
        names[0].indexOf("ja".toString());
    }

    @Benchmark
    public void indexOf() {
        names[0].indexOf("ja");
    }

    @Benchmark
    public void matches() {
        names[0].matches(".*ja.*");
    }
}

I get the following results:

Benchmark                      Mode  Cnt     Score    Error   Units
MyBenchmark.contains          thrpt   20   219.770 ±  2.032  ops/us
MyBenchmark.containsExplicit  thrpt   20  1820.024 ± 20.583  ops/us
MyBenchmark.indexOf           thrpt   20  1828.234 ± 18.744  ops/us
MyBenchmark.matches           thrpt   20     3.933 ±  0.052  ops/us

Now, that's fairly interesting, as it still suggests that contains is significantly slower than indexOf. However, if I change the test up, very slightly, to the following:

@Benchmark
public void contains() {
    assert names[0].contains("ja");
}

@Benchmark
public void containsExplicit() {
    assert names[0].indexOf("ja".toString()) == 0;
}

@Benchmark
public void indexOf() {
    assert names[0].indexOf("ja") == 0;
}

@Benchmark
public void matches() {
   assert names[0].matches(".*ja.*");
}

I get the following results:

Benchmark                      Mode  Cnt    Score   Error   Units
MyBenchmark.contains          thrpt   20  220.480 ± 1.266  ops/us
MyBenchmark.containsExplicit  thrpt   20  219.962 ± 2.329  ops/us
MyBenchmark.indexOf           thrpt   20  219.706 ± 2.401  ops/us
MyBenchmark.matches           thrpt   20    3.766 ± 0.026  ops/us

In this, we're getting the same result for contains, but indexOf has slowed down to match contains. That's a very interesting result. Why is this happening?

Probably due to HotSpot recognising that the result of the indexOf call is never inspected, and since it's taking a final class (String), HotSpot is likely able to guarantee that there are no side effects to the call. So if we're not looking at the result and there are no side effects to the call, why are we making it? HotSpot is able to realise that a method call is pointless, and remove it altogether, which could be what's happening here. It would certainly explain the order of magnitude difference.

Why doesn't this work for contains, though? I can only assume that it's because contains accepts a CharSequence, not a String, which is an abstract class, and that's just enough to prevent HotSpot from optimising the method call away.

This also indicates that micro-benchmarks are hard in Java - there is a lot going on beneath the surface to optimise your running code, and a few shortcuts can result in extremely inaccurate benchmarks.

answered Oct 12 '22 23:10

ipsi

indexOf is sample of intrinsic method in Hotspot JVM. It means that java code from java.lang.String is not used at all for this method. There is special native version of this method. You can find list of such methods here: do_intrinsic(_indexOf

answered Oct 13 '22 01:10

sibnick

Related questions
                            
                                Vaadin - Refresh grid after row modification
                            
                                Java type casting - float (and long) to int
                            
                                How to return if optional is present and continue if not
                            
                                Java equivalent of an OpenSSL AES CBC encryption
                            
                                Passing info between steps in Spring? [duplicate]
                            
                                Convert Joda LocalDateTime to Unix timestamp
                            
                                Passing unknown ArrayList type to method
                            
                                Equivalent to map (from haskell) in Java 7
                            
                                Calendar - Get last day of previous month
                            
                                How to return a default boolean value in java streams if element not found?
                            
                                Decide what constructor call based on user input
                            
                                Run specific JMH benchmarks from jar
                            
                                Passing pageable (spring data) in post request
                            
                                Use @Validated and @Valid with spring validator
                            
                                What does line.split(",")[1] mean [Java]?
                            
                                What is the size of methods that JIT automatically inlines?
                            
                                Adding timestamp to a log file using Logback-test.xml
                            
                                RxJava noClassDefFoundError: rx.plugins.RxJavaPlugins on API 16
                            
                                Java nio2 Directory is not closed. Causes "too many open files" error [closed]
                            
                                How to use RestTemplate with application/octet-stream response type

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With