Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Difference in lambda performances?

This is not a duplicate of my question. I checked it and it is more about inner anonymous classes.

I was curious about Lambda expressions and tested the following :

  • Given an array of ten thousand entries, what would be the faster to delete certain indexes : Lamba expression or For-Loop with an if test inside?

First results were not surprising in the fact that I did not know what I was going to come up with :

final int NUMBER_OF_LIST_INDEXES = 10_000;
List<String> myList = new ArrayList<>();
String[] myWords = "Testing Lamba expressions with this String array".split(" ");
    
for (int i = 0 ; i < NUMBER_OF_LIST_INDEXES ; i++){
    myList.add(myWords[i%6]);
}
        
long time = System.currentTimeMillis();
        
// BOTH TESTS WERE RUN SEPARATELY OF COURSE
// PUT THE UNUSED ONE IN COMMENTS WHEN THE OTHER WAS WORKING
        
// 250 milliseconds for the Lambda Expression
myList.removeIf(x -> x.contains("s"));
        
// 16 milliseconds for the traditional Loop
for (int i = NUMBER_OF_LIST_INDEXES - 1 ; i >= 0 ; i--){
    if (myList.get(i).contains("s")) myList.remove(i);
}
System.out.println(System.currentTimeMillis() - time + " milliseconds");

But then, I decided to change the constant NUMBER_OF_LIST_INDEXES to one million and here is the result :

final int NUMBER_OF_LIST_INDEXES = 1_000_000;
List<String> myList = new ArrayList<>();
String[] myWords = "Testing Lamba expressions with this String array".split(" ");
    
for (int i = 0 ; i < NUMBER_OF_LIST_INDEXES ; i++){
    myList.add(myWords[i%6]);
}
    
long time = System.currentTimeMillis();
    
// BOTH TESTS WERE RUN SEPARATELY OF COURSE
// PUT THE UNUSED ONE IN COMMENTS WHEN THE OTHER WAS WORKING
    
// 390 milliseconds for the Lambda Expression
myList.removeIf(x -> x.contains("s"));
    
// 32854 milliseconds for the traditional Loop
for (int i = NUMBER_OF_LIST_INDEXES - 1 ; i >= 0 ; i--){ 
    if (myList.get(i).contains("s")) myList.remove(i);
}
System.out.println(System.currentTimeMillis() - time + " milliseconds");

To make things simpler to read, here are the results :

|        |  10.000 | 1.000.000 |
| LAMBDA |  250ms  |   390ms   | 156% evolution
|FORLOOP |   16ms  |  32854ms  | 205000+% evolution

I have the following questions :

  • What is magic behind this? How do we come to such a big difference for the array and not for the lambda when the indexes to work with is *100.

  • In terms of performance, how do we know when to use Lambdas and when to stick to traditional ways to work with data?

  • Is this a specific behavior of the List method? Are other Lambda expression also produce random performances like this one?

like image 579
Yassin Hajaj Avatar asked Oct 17 '15 02:10

Yassin Hajaj


3 Answers

Because remove(index) is very expensive :) It needs to copy and shift the rest of elements, and this is done multiple times in your case.

While removeIf(filter) does not need to do that. It can sweep once and mark all elements to be deleted; then the final phase copies survivors to the head of list just once.

like image 80
ZhongYu Avatar answered Oct 17 '22 01:10

ZhongYu


I wrote a JMH benchmark to measure this. There are 4 methods in it:

  • removeIf on an ArrayList.
  • removeIf on a LinkedList.
  • iterator with iterator.remove() on an ArrayList.
  • iterator with iterator.remove() on a LinkedList.

The point of the benchmark is to show that removeIf and an iterator should provide the same performance, but that it is not the case for an ArrayList.

By default, removeIf uses an iterator internally to remove the elements so we should expect the same performance with removeIf and with an iterator.


Now consider an ArrayList which uses an array internally to hold the elements. Everytime we call remove, the remaining elements after the index have to be shifted by one; so each time a lot of elements have to be copied. When an iterator is used to traverse the ArrayList and we need to remove an element, this copying needs to happen again and again, making this very slow. For a LinkedList, this is not the case: when an element is deleted, the only change is the pointer to the next element.

So why is removeIf as fast on an ArrayList as on a LinkedList? Because it is actually overriden and it does not use an iterator: the code actually marks the elements to be deleted in a first pass and then deletes them (shifting the remaining elements) in a second pass. An optimization is possible in this case: instead of shifting the remaining elements each time one needs to be removed, we only do it once when we know all the elements that need to be removed.


Conclusion:

  • removeIf should be used when one needs to remove every elements matching a predicate.
  • remove should be used to remove a single known element.

Result of benchmark:

Benchmark                            Mode  Cnt      Score      Error  Units
RemoveTest.removeIfArrayList         avgt   30      4,478 ±   0,194  ms/op
RemoveTest.removeIfLinkedList        avgt   30      3,634 ±   0,184  ms/op
RemoveTest.removeIteratorArrayList   avgt   30  27197,046 ± 536,584  ms/op
RemoveTest.removeIteratorLinkedList  avgt   30      3,601 ±   0,195  ms/op

Benchmark:

@Warmup(iterations = 5, time = 1000, timeUnit = TimeUnit.MILLISECONDS)
@Measurement(iterations = 10, time = 1000, timeUnit = TimeUnit.MILLISECONDS)
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@Fork(3)
@State(Scope.Benchmark)
public class RemoveTest {

    private static final int NUMBER_OF_LIST_INDEXES = 1_000_000;
    private static final String[] words = "Testing Lamba expressions with this String array".split(" ");

    private ArrayList<String> arrayList;
    private LinkedList<String> linkedList;

    @Setup(Level.Iteration)
    public void setUp() {
        arrayList = new ArrayList<>();
        linkedList = new LinkedList<>();
        for (int i = 0 ; i < NUMBER_OF_LIST_INDEXES ; i++){
            arrayList.add(words[i%6]);
            linkedList.add(words[i%6]);
        }
    }

    @Benchmark
    public void removeIfArrayList() {
        arrayList.removeIf(x -> x.contains("s"));
    }

    @Benchmark
    public void removeIfLinkedList() {
        linkedList.removeIf(x -> x.contains("s"));
    }

    @Benchmark
    public void removeIteratorArrayList() {
        for (ListIterator<String> it = arrayList.listIterator(arrayList.size()); it.hasPrevious();){
            if (it.previous().contains("s")) it.remove();
        }
    }

    @Benchmark
    public void removeIteratorLinkedList() {
        for (ListIterator<String> it = linkedList.listIterator(linkedList.size()); it.hasPrevious();){
            if (it.previous().contains("s")) it.remove();
        }
    }

    public static void main(String[] args) throws Exception {
         Main.main(args);
    }

}
like image 36
Tunaki Avatar answered Oct 17 '22 02:10

Tunaki


I think the performance difference you're seeing is probably due more to removeIf's use of an iterator internally vs. get and remove in your for loop. The answers in this PAQ have some good information on the benefits of iterators.

bayou.io's answer is spot on, you can see the code for removeIf here it does two passes to avoid shifting the remaining elements over and over.

like image 41
gordy Avatar answered Oct 17 '22 02:10

gordy