I had recently learnt about jemalloc, it is the memory allocator used by firefox. I have tried integrating jemalloc into my system by overriding new and delete operator and calling the jemalloc equivalents of malloc and free i.e je_malloc and je_free.I have written a test application that does 100 million allocations.I have run the application both with glibc malloc and jemalloc, while running with jemalloc takes lesser time for such allocations the CPU utilization is pretty high, moreover the the memory foot print is also larger as compared to malloc. After reading this document on jemalloc analysis it seemed that jemalloc might have footprints greater than malloc as it employs techniques to optimize speed than memory. However, I haven't got any pointers to the CPU usage with Jemalloc. I would like to state that I working on a multiprocessor machine the details of which are given below.
processor : 11 vendor_id : GenuineIntel cpu family : 6 model : 44 model name : Intel(R) Xeon(R) CPU X5680 @ 3.33GHz stepping : 2 cpu MHz : 3325.117 cache size : 12288 KB physical id : 1 siblings : 12 core id : 10 cpu cores : 6 apicid : 53 fpu : yes fpu_exception : yes cpuid level : 11 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx pdpe1gb rdtscp lm constant_tsc ida nonstop_tsc arat pni monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr sse4_1 sse4_2 popcnt lahf_lm bogomips : 6649.91 clflush size : 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management: [8]
I am using top -c -b -d 1.10 -p 24670 | awk -v time=$TIME '{print time,",",$9}' to keep track of the CPU usage.
Did someone have similar experiences while integrating Jemlloc?
Thanks!
One wise guy said on CppCon that you never have to guess about performance. You have to measure it instead.
I tried to use jemalloc
with multithreaded Linux application. It was custom application level protocol server (over TCP/IP). This C++ application used some Java code via JNI (near 5% of time it used Java, and 95% of time it used C++ code) I run 2 application instances in production mode. Each one had 150 threads.
After 72 hours of running glibc
one used 900 M of memory, and jemalloc
one used 2.2 G of memory. I didn't see significant CPU usage difference. Actual performance (average client request serving time) was near the same for both instances.
So, in my test glibc
was much better than jemalloc
. Of course, it is my application specific.
Conclusion: If you have reasons to think that your application memory management is not effective because of fragmentation, you have to make test similar to one I described. It is the only reliable information source for your specific needs. If jemalloc
is always better that glibc
, glibc
will make jemalloc
its official allocator. If glibc
is always better, jemalloc
will stop to exist. When competitors exist long time in parallel, it means that each one has its own usage niche.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With