Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is getting a value from the end of a LinkedList much slower than from the start?

I have a LinkedList of 1,000,000 items. I measured the retrieval of an item first at index 100,000 and then at index 900,000. In both cases, the LinkedList goes through 100,000 operations to get to the desired index. So why is the retrieval from the end so much slower than from the start? Measurements taken with JMH.

@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@Warmup(iterations = 10)
@Measurement(iterations = 10)
public class ComparationGet {

    static int val1 = 100_000;
    static int val2 = 500_000;
    static int val3 = 900_000;

    @Benchmark
    public void testGet1LinkedListFromStart(Blackhole blackhole, MyState state) {
        MyDigit res1 = state.linkedList.get(val1);
        blackhole.consume(res1);
    }

    @Benchmark
    public void testGet2LinkedListFromEnd(Blackhole blackhole, MyState state) {
        MyDigit res1 = state.linkedList.get(val3);
        blackhole.consume(res3);
    }
}

Results:

from start:
ComparationGet.testGet1LinkedListFromStart  avgt   10  0,457 ± 0,207  ms/op

from end:
ComparationGet.testGet2LinkedListFromEnd  avgt   10  5,789 ± 3,094  ms/op

State class:

@State(Scope.Thread)
public class MyState {
    public List<MyDigit> linkedList;


    private int iterations = 1_000_000;

    @Setup(Level.Invocation)
    public void setUp() {
        linkedList = new LinkedList<>();

        for (int i = 0; i < iterations; i++) {
            linkedList.add(new MyDigit(i));
        }
    }
}

MyDigit class:

public class MyDigit{
    private int val;

    public MyDigit(int val) {
        this.val = val;
    }
}

LinkedList get method:

public E get(int index) {
    checkElementIndex(index);
    return node(index).item;
}

Node<E> node(int index) {
    // assert isElementIndex(index);

    if (index < (size >> 1)) {
        Node<E> x = first;
        for (int i = 0; i < index; i++)
            x = x.next;
        return x;
    } else {
        Node<E> x = last;
        for (int i = size - 1; i > index; i--)
            x = x.prev;
        return x;
    }
}
like image 695
Alexander Tukanov Avatar asked Aug 10 '20 13:08

Alexander Tukanov


Video Answer


1 Answers

LinkedList is a fine example of the limitations of fundamental informatics-based reasoning about algorithms. Basic reasoning about the code here, and treating a computer as a simple von neumann model, would dictate that either benchmark needs 100k steps to get from one 'end' to the desired item, and therefore, the benchmark should report equal times, give or take some statistical noise.

In actual fact, one is an order of magnitude slower than the other.

LinkedList is almost always the loser in such issues. In fact, as a rule of thumb, LinkedList should be banned from all codebases. It's almost always vastly slower than basic reasoning would indicate, and in the rare circumstances where LinkedList would (actually, in real benchmarks, not theoretically!) outperform an ArrayList, there's almost always a different type that's even more suitable, such as, say, ArrayDeque.

But, why?

There are many reasons. But usually it has to do with cache paging.

NB: For the CPU design expert: I've oversimplified rather a lot, to try to explain the key aspect (which is that cache misses drown out any algorithmic expectations).

Modern CPUs have hierarchical layers of memory. The slowest, by far, is 'main memory' (that 16GB of RAM or whatnot that you have). The CPU cannot actually read from main memory, at all. And yet O(n) analysis thinks that they can.

Then there's layers of caches, generally 3 (L1 to L3), and even faster than those, registers.

When you read some memory, what actually happens is that the system checks if what you want to read is mapped onto one of the caches, and only entire pages worth of memory can be, so it first checks which page your data is in, and then checks if said page is in one of those caches. If yes, great, the operation succeeds.

If not, uhoh. The CPU can't do your job. So instead, the CPU goes and does something else, or will just twiddle its thumbs for at least 500 cycles (more on faster CPUs!) whilst it evicts some page from one of the caches and copies over from main memory the page you wanted into one of the caches.

Only then can it continue.

Java guarantees that arrays are consecutive. if you declare, say, new int[1000000] java will guarantee that all 1000000 4-byte sequences are all right next to each other, so if you iterate through it, you get the minimum possible 'cache miss' events (where you read from some memory that isn't in one of the caches).

So, if you have an ArrayList, that is, well, backed by an array, so that array is guaranteed consecutive. However, the objects inside don't have to be. Unlike with new int[1000000], with new Object[1000000], you just have the pointers all consecutive; the actual objects they point at need not be.

However, for this test you've set up, that is immaterial, nothing in your code actually 'follows the pointer'.

In LinkedLists, you end up with no array at all, and instead with 2*X (X being the size of the list) objects: Your X objects you are storing, as well as X 'trackers'; each tracker contains a pointer (in java: reference) to the actual object being stored, as well as a 'previous' and 'next' pointer, pointing at its sibling tracker objects.

None of these are guaranteed to be consecutive in memory.

They could be smeared all over. Even just looping through each element in a list of 1000000, not following pointers at all, if the trackers are all over the place that's theoretically worst case scenario 1000000 case misses.

Cache misses are so slow, and CPUs are so fast, that you can safely consider the job of iterating through each tracker (or through each item in a 1000000-sized array) as entirely free, zero CPU time required, as long as you don't run into cache misses: The cache misses tend to dominate the time requirements.

You'd have to investigate further, but here is a plausible explanation for what you're witnessing:

Your code runs in isolation (it is not doing much else); so your init is running unimpeded, and whilst java makes no consecutive guarantees about any of this, your actual memory layout looks like: a MyDigit object, then a linkedlist tracker, then another mydigit object, then another linkedlist tracker, and so on.

Nevertheless, going from the last node involves a number of cache misses, whereas going from the front (which also had the benefit of starting at 'byte 0' of a page) isn't nearly as badly affected.

For reference, here is a chart of access times of fetching a certain sized chunk of data, assuming optimal caching - Note the biiig spike when you get to 4M.

like image 164
rzwitserloot Avatar answered Oct 20 '22 16:10

rzwitserloot