I have a C++ app on Linux which is extremely latency sensitive. My memory usage is around 2GB, so with 4kb pages and 64 TLB entries, I am going to be encountering TLB misses.
I read in the Intel developer manuals the 2MB (or 4MB?) "huge" pages only reduce the number of TLB entries by half, so the increase in memory range offsets the reduction in TLB entries and it would be better for performance.
How do I allocate memory using "huge" pages in a C++ application? Are there any trade-offs I should be aware of?
My Linux is a Red Hat distribution.
Since shared mappings are always backed by files in the hugetlbfs filesystem, the hugetlbfs code ensures each inode contains a reservation map. As a result, the reservation map is allocated when the inode is created.
Transparent Huge Pages (THP) is a Linux memory management system that reduces the overhead of Translation Lookaside Buffer (TLB) lookups on machines with large amounts of memory by using larger memory pages.
Transparent Hugepages are similar to standard HugePages. However, while standard HugePages allocate memory at startup, Transparent Hugepages memory uses the khugepaged thread in the kernel to allocate memory dynamically during runtime, using swappable HugePages.
You can also try to use transparent huge page support which is available on any kernel from the last several years (at least anything in the 3.x and 4.x range and also various 2.6.x kernels).
The primary benefit is that you don't need to have any special "hugetlbfs" set up, it "just works". The downside is that it is not guaranteed: the kernel may satisfy your allocations with huge pages if some conditions are met and some are available. Unlike hugetlbfs
which reserves a fixed number of huge pages at startup, which are only available via specific calls, transparent huge pages carves huge pages out of the general memory pool. This requires contiguous 2MB blocks of physical memory which may become rare as the system remains up time due to physical memory fragmentation.
Furhtermore, there are various kernel tunables which affect whether you get a hugepage or not, the most important of which is /sys/kernel/mm/transparent_hugepage/enabled
.
Your best bet is to allocate blocks on a 2MB boundary with posix_memalign
and then do a madvise(MADV_HUGEPAGE)
on the allocated region before touching it for the first time. It also works with variants like aligned_alloc
. In my experience, on systems that have /sys/kernel/mm/transparent_hugepage/enabled
set to always
this generally results in a hugepage. However, I've mostly used on systems with significant free memory and not-too-long uptime.
If you are using 2GB of memory, you could probably get a significant benefit from huge pages. If you allocate that all in small blocks, e.g. via malloc
there is a high chance transparent hugepages won't kick in, so you can also consider allocating in a THP-aware way whatever is using the bulk of your memory (often it is a single object type).
I also wrote a library to determine if you actually got hugepages from any given allocation. This probably isn't useful in a production application, but it can be a helpful diagnostic if you go the route of trying to use THP since at least you can determine if you got them or not.
The "hugetlb" documentation from the kernel should help here.
Users can use the huge page support in Linux kernel by either using the mmap system call or standard SYSV shared memory system calls (shmget, shmat).
And:
Examples
1) map_hugetlb: see tools/testing/selftests/vm/map_hugetlb.c
2) hugepage-shm: see tools/testing/selftests/vm/hugepage-shm.c
3) hugepage-mmap: see tools/testing/selftests/vm/hugepage-mmap.c
4) The libhugetlbfs (https://github.com/libhugetlbfs/libhugetlbfs) library provides a wide range of userspace tools to help with huge page >usability, environment setup, and control.
(These paths refer to the linux source tree).
So it basically boils down to:
mmap
with MAP_HUGETLB
flagIf you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With