Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Bjarne Stroustrup says we must avoid linked lists

I saw this video on YouTube: https://www.youtube.com/watch?v=YQs6IC-vgmo in which Bjarne says it is better to use vectors, rather than linked lists. I am unable to grasp the entire thing, so could anyone explain what he is saying in layman's terms?

P.S: I am an high school student and can easily handle linked lists, but I am struggling to learn vectors on my own. Could you suggest any sources to learn vectors?

like image 450
ShankRam Avatar asked Dec 09 '15 04:12

ShankRam


People also ask

Are linked lists still used?

There is a time and a place to use linked lists and most commonly it's when you want quickly add and remove elements from a container. Usually this occurs in stacks and queues with lower space time complexity over arrays or when you want to keep ordered data with more flexibility than arrays.

Are there linked lists in C++?

A linked list in C++ is a linear dynamic data structure. The node of a singly or a circular linked list has two parts: data and the address of the next node. The node of a doubly linked list has three parts: data, the address of the previous node, and the address of the next node.

Is linked list in C and C++ same?

A linked list in C/C++ is basically a linear data structure based on the concept of dynamic memory allocation. It is implemented with the help of pointers. The linked list in C and C++ tutorial is specially designed for the beginners, who are not aware of the importance of linked lists.

What is an intrusive linked list?

What are intrusive linked lists? Intrusive linked lists are a variation of linked lists where the links are embedded in the structure that's being linked. In a typical linked list implementation, a list node contains a data pointer to the linked data and a next pointer to the next node in the list.


1 Answers

Advantages of vector vs. linked list

The main advantage of vector versus linked lists is memory locality.

Usually, each element is allocated seperately in a linked list. As a consequence those elements probably are not next to each other in memory. (Gaps between the elements in memory.)

A vector is guaranteed to store all contained elements contiguously. (Items next to each other, no gaps;)

Note: Oversimplifications may occur... ;)

Imo, the simplified key points about the superior performance of a contiguously stored data storage pattern versus non-contiguous storage are

1. cache misses

Modern CPUs do not fetch single bytes from memory but slightly larger chunks. Thus, if your data objects size is less than the size of those chunks and if storage is contiguous, you can get more than one element at a time because multiple elements may be in a single chunk.

Example:

A 64byte block (usual cache line size) fits sixteen 32bit integers at a time. Therefore, a cache miss (data not in cache yet -> load from main memory required) occurs at the earliest after processing 16 elements from the moment first one has been brought to cache. If a linked list is used, the first element might well be the only one within a 64byte block. It might in theory happen that a cache miss occurs for every single element of the list.

Concrete example:

std::vector<std::uint32_t> v;
// somehow fill 64 values into v
std::uint32_t s{};
for(std::size_t i{0}; i<v.size(); ++i)
{
  s += v[i];
}

Imagine the contents of v not being cached yet.

What happens during the processing of the data in the for loop?

1)Check whether element v[0] is in cache. --> Nope

2)Fetch 64 bytes starting at the address of v[0] from main memory into a cache line

3)Load v[0] from cache and process by adding its value to s

4)Is element v1 in cache? --> Yes loaded with previous fetch because neighbouring v[0]

5)Load v1 from cache and process by adding its value to s

6)Is element v[2] in cache? --> Yes ...

7) Load v[2] from cache and process by adding its value to s

... etc...

34)Is element v[16] in cache? --> Nope

35) Fetch 64 bytes starting at the address of v[16] from main memory into a cache line

36)Load v[16] from cache and process by adding its value to s

37)Is element v[17] in cache? --> Yes loaded with previous fetch because neighbouring v[16]

etc...

Fetching data from main memory into the cache takes much more time than loading data from cache into processor registers and perform simple operations. Therefore, the fact that multiple values may reside on a single cache line can boost performance significantly.

Linked lists do not provide a contiguous storage guarantee and you cannot hope to get this performance boost. This is also the reason why random iteration (accessing elements randomly) performs worse than forward iteration (accessing elements in order) for contiguous containers.

2. prefetching

The above effect is amplified by a CPU feature called "prefetcher".

If a chunk has been loaded from main memory, the prefetcher prepares loading the next chunk / already puts it into cache, significantly reducing the penality of loading stuff from that part of the main memory.

This is of course effective if and only if you in fact need data from the next prepared chunk.

How do vectors usually work (internally)?

See: c++ Vector, what happens whenever it expands/reallocate on stack?

like image 73
Pixelchemist Avatar answered Oct 30 '22 00:10

Pixelchemist