I'm comparing 2 ways to filter lists, with and without using streams. It turns out that the method without using streams is faster for a list of 10,000 items. I'm interested in understanding why is it so. Can anyone explain the results please?
public static int countLongWordsWithoutUsingStreams(
final List<String> words, final int longWordMinLength) {
words.removeIf(word -> word.length() <= longWordMinLength);
return words.size();
}
public static int countLongWordsUsingStreams(final List<String> words, final int longWordMinLength) {
return (int) words.stream().filter(w -> w.length() > longWordMinLength).count();
}
Microbenchmark using JMH:
@Benchmark
@BenchmarkMode(Throughput)
@OutputTimeUnit(MILLISECONDS)
public void benchmarkCountLongWordsWithoutUsingStreams() {
countLongWordsWithoutUsingStreams(nCopies(10000, "IAmALongWord"), 3);
}
@Benchmark
@BenchmarkMode(Throughput)
@OutputTimeUnit(MILLISECONDS)
public void benchmarkCountLongWordsUsingStreams() {
countLongWordsUsingStreams(nCopies(10000, "IAmALongWord"), 3);
}
public static void main(String[] args) throws RunnerException {
final Options opts = new OptionsBuilder()
.include(PracticeQuestionsCh8Benchmark.class.getSimpleName())
.warmupIterations(5).measurementIterations(5).forks(1).build();
new Runner(opts).run();
}
java -jar target/benchmarks.jar -wi 5 -i 5 -f 1
Benchmark
Mode Cnt Score Error Units
PracticeQuestionsCh8Benchmark.benchmarkCountLongWordsUsingStreams thrpt 5 10.219 ± 0.408 ops/ms
PracticeQuestionsCh8Benchmark.benchmarkCountLongWordsWithoutUsingStreams thrpt 5 910.785 ± 21.215 ops/ms
Edit: (as someone deleted the update posted as an answer)
public class PracticeQuestionsCh8Benchmark {
private static final int NUM_WORDS = 10000;
private static final int LONG_WORD_MIN_LEN = 10;
private final List<String> words = makeUpWords();
public List<String> makeUpWords() {
List<String> words = new ArrayList<>();
final Random random = new Random();
for (int i = 0; i < NUM_WORDS; i++) {
if (random.nextBoolean()) {
/*
* Do this to avoid string interning. c.f.
* http://en.wikipedia.org/wiki/String_interning
*/
words.add(String.format("%" + LONG_WORD_MIN_LEN + "s", i));
} else {
words.add(String.valueOf(i));
}
}
return words;
}
@Benchmark
@BenchmarkMode(AverageTime)
@OutputTimeUnit(MILLISECONDS)
public int benchmarkCountLongWordsWithoutUsingStreams() {
return countLongWordsWithoutUsingStreams(words, LONG_WORD_MIN_LEN);
}
@Benchmark
@BenchmarkMode(AverageTime)
@OutputTimeUnit(MILLISECONDS)
public int benchmarkCountLongWordsUsingStreams() {
return countLongWordsUsingStreams(words, LONG_WORD_MIN_LEN);
}
}
public static int countLongWordsWithoutUsingStreams(
final List<String> words, final int longWordMinLength) {
final Predicate<String> p = s -> s.length() >= longWordMinLength;
int count = 0;
for (String aWord : words) {
if (p.test(aWord)) {
++count;
}
}
return count;
}
public static int countLongWordsUsingStreams(final List<String> words,
final int longWordMinLength) {
return (int) words.stream()
.filter(w -> w.length() >= longWordMinLength).count();
}
Whenever your benchmark says that some operation over 10000 elements takes 1ns (edit: 1µs), you probably found a case of clever JVM figuring out that your code doesn't actually do anything.
Collections.nCopies
doesn't actually make a list of 10000 elements. It makes a sort of a fake list with 1 element and a count of how many times it's supposedly there. That list is also immutable, so your countLongWordsWithoutUsingStreams
would throw an exception if there was something for removeIf
to do.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With