I am implementing a disk based hashtable supporting large amount of keys (26+ million). The value is deserialized. Reads are essentially random throughout the file, values are less than the page size, and I am optimising for SSDs. Safety/consistency are not such huge issues (performance matters).
My current solution involves using a mmap()
file with MADV_RANDOM | MADV_DONTNEED
set to disable prefetching by the kernel and only load data as needed on-demand.
I get the idea that the kernel reads from disk to memory buffer, and I deserialize from there.
What about O_DIRECT
? If I call read()
, I'm still copying into a buffer (which I deserialize from) so can I gain any advantage?
Where can I find more info on the buffers involved with a mmap()
file and calling read()
on a file opened with O_DIRECT
?
I am not interested in read ahead or caching. It has nothing to offer for my use case.
Read uses the standard file descriptor access to files while mmap transparently maps files to locations in the process's memory. Most operating systems also use mmap every time a program gets loaded into memory for execution. Even though it is important and often used, mmap can be slow and inconsistent in its timing.
mmap works by manipulating your process's page table, a data structure your CPU uses to map address spaces. The CPU will translate "virtual" addresses to "physical" ones, and does so according to the page table set up by your kernel. When you access the mapped memory for the first time, your CPU generates a page fault.
The page fault happens after mmap has returned, and you start using your allocated segment.
When I ask my colleagues why mmap is faster than system calls, the answer is inevitably “system call overhead”: the cost of crossing the boundary between the user space and the kernel.
O_DIRECT is option for read/write operations, when data bypass system buffers, and copied directlty from your buffer to disk controller. For get advantages of O_DIRECT, need to comply some conditions - keep aligned by memory page buffer address and buffer size aligned by I/O block.
Anyway, if you use mmap, you do not use read/write. Moreover, after mmap, you can close file descriptor, and mapping will still works. So, O_DIRECT useless with mmap option.
What can I recommend for increase performance:
If your subsystem has many request for search missing key, you can create Bloom filter in the memory. Thereafter, you match your search key on Bloom filter http://en.wikipedia.org/wiki/Bloom_filter, and reject missing keys, without actual request to disk.
For conserve memory, use 2-level scheme, when bucket heads you keep in the mmap-ed memory, but buckets itself you read from file by pread().
Both options I implemented in the my autocomplete subsytem, you can see it online here: http://olegh.ftp.sh/autocomplete.html and estimate performance on the slow old computer - Celeron-300.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With