The Linux Kernel uses a linked list for TCP and a RB tree for process scheduling.
In terms of algorithmic efficiency, these make sense. You're going to be getting a bunch of packets one at a time so O(1) insertion at the end of the list is nice.
With process scheduling, the Completely Fair Scheduler uses a Red Black tree, with O(1) time to choose tasks.
As far as I know neither of these are implemented in a 'flat' way - the linked list is a bunch of nodes with pointers to other nodes, just like the tree. This means that locality for these structures, as far as I know, should be poor.
From what I've seen cache locality often outweighs algorithmic efficiency.
Is there something about the data set that Linux is programmed for that makes the algorithmic efficiency of these structures outweigh the cache efficiency of others?
Am I misunderstanding something? Has anyone profiled?
I have a partial answer for the use of linked lists, and I believe you will find some interesting information in this page about the CFS scheduler, in particular it mentions how a red-black tree has good worst-case time for operations such as insert, search, and delete
which indeed seems like a very desirable property for a scheduler.
I'd wager that, yes, the kernel has seen quite a lot of profiling and those data structures do seem to perform well in the real world.
This blog post has some nice data about the usage of the kernel's linked lists. This data was apparently collected over 3 days of normal use by keeping counters of each operation done.
+--------------------+-----------+
| Operation | Frequency |
+--------------------+-----------+
| Empty Test | 45% |
| Delete | 25% |
| Add | 23% |
| Iterate Start | 3.5% |
| Iterate Next | 2.5% |
| Replace | 0.76% |
| Other Manipulation | 0.0072% |
+--------------------+-----------+
As you can see operations actually accessing elements account for a tiny 6% of the total, while insertion and deletion add up to almost half. This is a use case where linked lists start to make much more sense. Note that the data was collected across the whole kernel, not specifically the TCP code, so the rationale for a linked list may not have been the same every time it was used, but the aggregate data does suggest that those were sane choices overall.
I think that it's also important to keep in mind that the kernel's design has to be able to scale from the tiniest embedded devices to supercomputers handling massive amounts of data, at which point the algorithmic efficiencies may start to seriously outweigh caching issues.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With