How to allocate "huge" pages for C++ application on Linux

Tags:

I have a C++ app on Linux which is extremely latency sensitive. My memory usage is around 2GB, so with 4kb pages and 64 TLB entries, I am going to be encountering TLB misses.

I read in the Intel developer manuals the 2MB (or 4MB?) "huge" pages only reduce the number of TLB entries by half, so the increase in memory range offsets the reduction in TLB entries and it would be better for performance.

How do I allocate memory using "huge" pages in a C++ application? Are there any trade-offs I should be aware of?

My Linux is a Red Hat distribution.

807

asked Sep 18 '15 13:09

user997112

2 Answers

You can also try to use transparent huge page support which is available on any kernel from the last several years (at least anything in the 3.x and 4.x range and also various 2.6.x kernels).

The primary benefit is that you don't need to have any special "hugetlbfs" set up, it "just works". The downside is that it is not guaranteed: the kernel may satisfy your allocations with huge pages if some conditions are met and some are available. Unlike hugetlbfs which reserves a fixed number of huge pages at startup, which are only available via specific calls, transparent huge pages carves huge pages out of the general memory pool. This requires contiguous 2MB blocks of physical memory which may become rare as the system remains up time due to physical memory fragmentation.

Furhtermore, there are various kernel tunables which affect whether you get a hugepage or not, the most important of which is /sys/kernel/mm/transparent_hugepage/enabled.

Your best bet is to allocate blocks on a 2MB boundary with posix_memalign and then do a madvise(MADV_HUGEPAGE) on the allocated region before touching it for the first time. It also works with variants like aligned_alloc. In my experience, on systems that have /sys/kernel/mm/transparent_hugepage/enabled set to always this generally results in a hugepage. However, I've mostly used on systems with significant free memory and not-too-long uptime.

If you are using 2GB of memory, you could probably get a significant benefit from huge pages. If you allocate that all in small blocks, e.g. via malloc there is a high chance transparent hugepages won't kick in, so you can also consider allocating in a THP-aware way whatever is using the bulk of your memory (often it is a single object type).

I also wrote a library to determine if you actually got hugepages from any given allocation. This probably isn't useful in a production application, but it can be a helpful diagnostic if you go the route of trying to use THP since at least you can determine if you got them or not.

175

answered Sep 22 '22 10:09

BeeOnRope

The "hugetlb" documentation from the kernel should help here.

Users can use the huge page support in Linux kernel by either using the mmap system call or standard SYSV shared memory system calls (shmget, shmat).

And:

Examples

1) map_hugetlb: see tools/testing/selftests/vm/map_hugetlb.c

2) hugepage-shm: see tools/testing/selftests/vm/hugepage-shm.c

3) hugepage-mmap: see tools/testing/selftests/vm/hugepage-mmap.c

4) The libhugetlbfs (https://github.com/libhugetlbfs/libhugetlbfs) library provides a wide range of userspace tools to help with huge page >usability, environment setup, and control.

(These paths refer to the linux source tree).

So it basically boils down to:

use mmap with MAP_HUGETLB flag
or, map a file from the mounted hugetlb filesystem, if it exists

answered Sep 21 '22 10:09

davmac

Related questions
                            
                                How to launch a KDE konsole with multiple tabs running various progs?
                            
                                Time complexity of JavaScript's array.length
                            
                                How to show progress for submodule fetching?
                            
                                Git conflicts with JSON files
                            
                                Spring ThreadPoolTaskScheduler vs ThreadPoolTaskExecutor
                            
                                Where are files for local branches stored
                            
                                How to set up GDB for debugging Rust programs in Windows?
                            
                                constexpr const char * vs constexpr const char[]
                            
                                Android appbarlayout elevation appears in status bar
                            
                                Weird exception "Invalid receiver type class java.lang.Object; not a subtype of ..."
                            
                                Can I use "Fn::Join" in "Parameters" of AWS Cloudformation json template
                            
                                ASP.NET MVC 6: view components in a separate assembly

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With