Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

c++ application fails allocating more hugepages than a certain limit

Overview

I have a c++ application that reads large amount of data (~1T). I run it using hugepages (614400 pages at 2M) and this works - until it hits 128G.

For testing I created a simple application in c++ that allocates chunks of 2M until it can't.

Application is run using:

LD_PRELOAD=/usr/lib64/libhugetlbfs.so HUGETLB_MORECORE=yes ./a.out 

While running I monitor the nr of free hugepages (from /proc/meminfo). I can see that it consumes hugepages at the expected rate.

However the application crashes with a std::bad_alloc exception at 128G allocated (or 65536 pages).

If I run two or more instances at the same time, they all crash at 128G each.

If I decrease the cgroup limit to something small, say 16G, the app crashes correctly at that point with a 'bus error'.

Am I missing something trivial? Please look below for details.

I'm running out of ideas...

Details

Machine, OS and software:

CPU    :  Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz Memory :  1.5T Kernel :  3.10.0-693.5.2.el7.x86_64 #1 SMP Fri Oct 20 20:32:50 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux OS     :  CentOS Linux release 7.4.1708 (Core)  hugetlbfs : 2.16-12.el7 gcc       : 7.2.1 20170829 

Simple test code I used (allocates chunks of 2M until free hugepages is below a limit)

#include <iostream> #include <fstream> #include <vector> #include <array> #include <string>   #define MEM512K 512*1024ul #define MEM2M   4*MEM512K  // data block template <size_t N> struct DataBlock {   char data[N]; };  // Hugepage info struct HugePageInfo {   size_t memfree;   size_t total;   size_t free;   size_t size;   size_t used;   double used_size; };  // dump hugepage info void dumpHPI(const HugePageInfo & hpi) {   std::cout << "HugePages total : " << hpi.total << std::endl;   std::cout << "HugePages free  : " << hpi.free << std::endl;   std::cout << "HugePages size  : " << hpi.size << std::endl; }  // dump hugepage info in one line void dumpHPIline(const size_t i, const HugePageInfo & hpi) {   std::cout << i << " "             << hpi.memfree << " "             << hpi.total-hpi.free << " "         << hpi.free << " "             << hpi.used_size             << std::endl; }  // get hugepage info from /proc/meminfo void getHugePageInfo( HugePageInfo & hpi ) {   std::ifstream fmeminfo;    fmeminfo.open("/proc/meminfo",std::ifstream::in);   std::string line;   size_t n=0;   while (fmeminfo.good()) {     std::getline(fmeminfo,line);     const size_t sep = line.find_first_of(':');     if (sep==std::string::npos) continue;      const std::string lblstr = line.substr(0,sep);     const size_t      endpos = line.find(" kB");     const std::string trmstr = line.substr(sep+1,(endpos==std::string::npos ? line.size() : endpos-sep-1));     const size_t      startpos = trmstr.find_first_not_of(' ');     const std::string valstr = (startpos==std::string::npos ? trmstr : trmstr.substr(startpos) );     if (lblstr=="HugePages_Total") {       hpi.total = std::stoi(valstr);     } else if (lblstr=="HugePages_Free") {       hpi.free = std::stoi(valstr);     } else if (lblstr=="Hugepagesize") {       hpi.size = std::stoi(valstr);     } else if (lblstr=="MemFree") {       hpi.memfree = std::stoi(valstr);     }   }   hpi.used = hpi.total - hpi.free;   hpi.used_size = double(hpi.used*hpi.size)/1024.0/1024.0; } // allocate data void test_rnd_data() {   typedef DataBlock<MEM2M> elem_t;   HugePageInfo hpi;   getHugePageInfo(hpi);   dumpHPIline(0,hpi);   std::array<elem_t *,MEM512K> memmap;   for (size_t i=0; i<memmap.size(); i++) memmap[i]=nullptr;    for (size_t i=0; i<memmap.size(); i++) {     // allocate a new 2M block     memmap[i] = new elem_t();      // output progress     if (i%1000==0) {       getHugePageInfo(hpi);        dumpHPIline(i,hpi);       if (hpi.free<1000) break;     }   }   std::cout << "Cleaning up...." << std::endl;   for (size_t i=0; i<memmap.size(); i++) {     if (memmap[i]==nullptr) continue;     delete memmap[i];   } }  int main(int argc, const char** argv) {   test_rnd_data(); } 

Hugepages is setup at boot time to use 614400 pages at 2M each.

From /proc/meminfo:

MemTotal:       1584978368 kB MemFree:        311062332 kB MemAvailable:   309934096 kB Buffers:            3220 kB Cached:           613396 kB SwapCached:            0 kB Active:           556884 kB Inactive:         281648 kB Active(anon):     224604 kB Inactive(anon):    15660 kB Active(file):     332280 kB Inactive(file):   265988 kB Unevictable:           0 kB Mlocked:               0 kB SwapTotal:       2097148 kB SwapFree:        2097148 kB Dirty:                 0 kB Writeback:             0 kB AnonPages:        222280 kB Mapped:            89784 kB Shmem:             18348 kB Slab:             482556 kB SReclaimable:     189720 kB SUnreclaim:       292836 kB KernelStack:       11248 kB PageTables:        14628 kB NFS_Unstable:          0 kB Bounce:                0 kB WritebackTmp:          0 kB CommitLimit:    165440732 kB Committed_AS:    1636296 kB VmallocTotal:   34359738367 kB VmallocUsed:     7789100 kB VmallocChunk:   33546287092 kB HardwareCorrupted:     0 kB AnonHugePages:         0 kB HugePages_Total:   614400 HugePages_Free:    614400 HugePages_Rsvd:        0 HugePages_Surp:        0 Hugepagesize:       2048 kB DirectMap4k:      341900 kB DirectMap2M:    59328512 kB DirectMap1G:    1552941056 kB 

Limits from ulimit:

core file size          (blocks, -c) 0 data seg size           (kbytes, -d) unlimited scheduling priority             (-e) 0 file size               (blocks, -f) unlimited pending signals                 (-i) 6191203 max locked memory       (kbytes, -l) 1258291200 max memory size         (kbytes, -m) unlimited open files                      (-n) 1024 pipe size            (512 bytes, -p) 8 POSIX message queues     (bytes, -q) 819200 real-time priority              (-r) 0 stack size              (kbytes, -s) 8192 cpu time               (seconds, -t) unlimited max user processes              (-u) 4096 virtual memory          (kbytes, -v) unlimited file locks                      (-x) unlimited 

cgroup limit:

> cat /sys/fs/cgroup/hugetlb/hugetlb.2MB.limit_in_bytes 9223372036854771712 

Tests

Output when running the test code using HUGETLB_DEBUG=1:

... libhugetlbfs [abc:185885]: INFO: Attempting to map 2097152 bytes libhugetlbfs [abc:185885]: INFO: ... = 0x1ffb200000 libhugetlbfs [abc:185885]: INFO: hugetlbfs_morecore(2097152) = ... libhugetlbfs [abc:185885]: INFO: heapbase = 0xa00000, heaptop = 0x1ffb400000, mapsize = 1ffaa00000, delta=2097152 libhugetlbfs [abc:185885]: INFO: Attempting to map 2097152 bytes libhugetlbfs [abc:185885]: WARNING: New heap segment map at 0x1ffb400000 failed: Cannot allocate memory terminate called after throwing an instance of 'std::bad_alloc'   what():  std::bad_alloc Aborted (core dumped) 

Using strace:

... mmap(0x1ffb400000, 2097152, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_HUGETLB, -1, 0x1ffa200000) = 0x1ffb400000 mmap(0x1ffb600000, 2097152, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_HUGETLB, -1, 0x1ffa400000) = 0x1ffb600000 mmap(0x1ffb800000, 2097152, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_HUGETLB, -1, 0x1ffa600000) = 0x1ffb800000 mmap(0x1ffba00000, 2097152, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_HUGETLB, -1, 0x1ffa800000) = 0x1ffba00000 mmap(0x1ffbc00000, 2097152, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_HUGETLB, -1, 0x1ffaa00000) = -1 ENOMEM (Cannot allocate memory) write(2, "libhugetlbfs", 12)            = 12 write(2, ": WARNING: New heap segment map "..., 79) = 79 mmap(NULL, 3149824, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory) mmap(NULL, 134217728, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = -1 ENOMEM (Cannot allocate memory) mmap(NULL, 67108864, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = -1 ENOMEM (Cannot allocate memory) mmap(NULL, 134217728, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = -1 ENOMEM (Cannot allocate memory) mmap(NULL, 67108864, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = -1 ENOMEM (Cannot allocate memory) mmap(NULL, 2101248, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory) write(2, "terminate called after throwing "..., 48) = 48 write(2, "std::bad_alloc", 14)          = 14 write(2, "'\n", 2)                      = 2 write(2, "  what():  ", 11)             = 11 write(2, "std::bad_alloc", 14)          = 14 write(2, "\n", 1)                       = 1 rt_sigprocmask(SIG_UNBLOCK, [ABRT], NULL, 8) = 0 gettid()                                = 188617 tgkill(188617, 188617, SIGABRT)         = 0 --- SIGABRT {si_signo=SIGABRT, si_code=SI_TKILL, si_pid=188617, si_uid=1001} --- 

Finally in /proc/pid/numa_maps:

... 1ffb000000 default file=/anon_hugepage\040(deleted) huge anon=1 dirty=1 N1=1 kernelpagesize_kB=2048 1ffb200000 default file=/anon_hugepage\040(deleted) huge anon=1 dirty=1 N1=1 kernelpagesize_kB=2048 1ffb400000 default file=/anon_hugepage\040(deleted) huge anon=1 dirty=1 N1=1 kernelpagesize_kB=2048 1ffb600000 default file=/anon_hugepage\040(deleted) huge anon=1 dirty=1 N1=1 kernelpagesize_kB=2048 1ffb800000 default file=/anon_hugepage\040(deleted) huge anon=1 dirty=1 N1=1 kernelpagesize_kB=2048 ... 
like image 386
Fredrik Tegenfeldt Avatar asked Apr 11 '18 15:04

Fredrik Tegenfeldt


People also ask

How do I allocate HugePages?

The administrator can allocate persistent huge pages on the kernel boot command line by specifying the "hugepages=N" parameter, where 'N' = the number of huge pages requested. This is the most reliable method of allocating huge pages as memory has not yet become fragmented.

What is HugePages Linux?

HugePages is a feature integrated into the Linux kernel 2.6. Enabling HugePages makes it possible for the operating system to support memory pages greater than the default (usually 4 KB).

Is it possible to allocate more pages than the minimum size?

There is no guarantee that more pages than this minimum pool size can be allocated. Since our hugepage size is 2MB you can see the hugepage pool for 2097152 while for the other hugepage size, no pools are configured

How to calculate the number of hugepages in a database?

For the calculation of the number of hugepages there is a easy way: If you run more than one database on your server, you should include the SGA of all of your instances into the calculation: ( SGA 1. Instance + SGA 2. Instance + … etc. ) / Hugepagesize = Number Hugepages

Is it possible to use hugepages with automatic memory management?

However their is a limitation by Oracle, because Automatic Memory Management (AMM) does not support HugePages. If you already use AMM and MEMORY_TARGET is set you have to disable it and switch back to Automatic Shared Memory Management (ASMM). That means set SGA_TARGET and PGA_AGGREGATE_TARGET.

How to check if your CPU supports hugepages?

You should also know how to check if your CPU supports hugepages This displays the Minimum, Current and Maximum number of huge pages in the pool for each pagesize supported by the system. The minimum value is allocated up front by the kernel and guaranteed to remain as hugepages until the pool is shrunk.


1 Answers

However the application crashes with a std::bad_alloc exception at 128G allocated (or 65536 pages).

You are allocating too many small-sized segments, there is a limit of the number of map segments you can get per process.

sysctl -n vm.max_map_count 

You are trying to use 1024 * 512 * 4 == 2097152 MAP at least and one more for the array, but the default value of vm.max_map_count is only 65536.

You can change it with:

sysctl -w vm.max_map_count=3000000 

Or you could allocate a bigger segment in your code.

like image 141
Stargateur Avatar answered Oct 06 '22 00:10

Stargateur