Usually, lists are implemented either as linked lists, which are slow to traverse, or as array lists, which are slow when inserting elements.
I was wondering if it would be possible to use the processor's MMU to implement lists more efficiently, by remapping instead of copying memory whenever an element is inserted or deleted. This would mean that both indexing and insertion/deletion anywhere in the array to have an speed of O(1), better than any other list implementation.
My questions are:
First some specific answers to your questions:
mmap
on UNIX-like OSes and similar APIs on Windows. Linux in particular has recently added several methods to allow advanced manipulation of user-visible buffers from the kernel without copying — but one of the interesting ones is no longer for this world (at least performance-wise). The core problem with using MMU tricks is that (a) you can only "zero copy" whole pages, which pretty much means at a 4K granularity or larger, along with similarly restrictive alignment (b) even if mmap
-type calls are fast, so are efficient memory copy routines!
Let's look first at (a). If I understand you correctly, you want to accelerate insertion into something like std::vector
by using MMU tricks to shift the elements that need to be moved when an insert happens. The problem is you can only shift by 0, 4096, 8192, etc bytes on typical systems! So if you insert one 4-byte int
into a vector<int>
how does that help? You could perhaps "crack apart" the underlying storage of the vector
into two parts at the insertion point and track that with the hope of merging them again some point (e.g, if you insert 4096 bytes worth of stuff) - but you end up with a different data structure, with different properties, and the MMU tricks aren't really fundamental here anyway.
That brings us to (b). Take for granted that on my machine a page can be remapped in ~120 ns (via mmap
). That seems fast (it's not bad when you consider it involves taking various kernel locks, messing with the page tables, adding a VMA, etc) — but copying memory is also very fast. On this same box, I can copy in-memory (i.e., to/from RAM of any cache level) at about 12 GB/s, while copies in L1 or L2 proceed at perhaps 80-100 GB/s. So copying a 4K page takes somewhere between 41 ns (cached) and 340 ns (uncached, to RAM). So messing with page tables isn't a clear win even if it would be possible, especially in the cached case (and the cached case is probably the dominant one, averaging over most workloads).
So these types of tricks may make sense, but only in specific scenarios, such as the following:
The most common and useful example of MMU tricks is probably realloc
. On Linux and Windows (it seems?), realloc
may be implemented by remapping and extending the mapped pages in memory (aka MMU tricks), which both avoids the physical copy and the need to temporarily have both the old allocated region and the new region "live" at once (which may be hard if their sum approaches the size of physical memory).
In particular, recent version of Linux will likely use mremap
to realloc
heap regions that were mmap
ed in the first place (by default, this occurs for allocation requests larger than 128K, but it may also occur when the space available to sbrk
is exhausted).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With