Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does NUMA architecture affect the performance of ActivePivot?

We are migrating an ActivePivot application to a new server (4 sockets Intel Xeon, 512GB of memory). After deploying we launched our application benchmark (that's a mix of large OLAP queries concurrent to real-time transactions). The measured performance is almost twice slower than on our previous server, that has similar processors but twice less cores and twice less memory.

We have investigated the differences between the two servers, and it appears the big one has a NUMA architecture (non uniform memory acccess). Each CPU socket is physically close to 1/4 of the memory, but further away from the rest of it... The JVM that runs our application allocates a large global heap, there is a random fraction of that heap on each NUMA node. Our analysis is that the memory access pattern is pretty random and CPU cores frequently waste time accessing remote memory.

We are looking after more feedback about leveraging ActivePivot on NUMA severs. Can we configure ActivePivot cubes, or thread pools, change our queries, configure the operating system?

like image 807
Jack Avatar asked Oct 31 '12 14:10

Jack


People also ask

Does NUMA improve performance?

Conclusions Linux NUMA tunings had a positive impact on performance of up to 4.2% for some HEP/NP benchmarks. However, specific tunings were best for different workloads and hardware.

What are NUMA effects?

Under NUMA, a processor can access its own local memory faster than non-local memory (memory local to another processor or memory shared between processors). The benefits of NUMA are limited to particular workloads, notably on servers where the data is often associated strongly with certain tasks or users.

What is the purpose of NUMA?

Non-uniform memory access, or NUMA, is a method of configuring a cluster of microprocessors in a multiprocessing system so they can share memory locally. The idea is to improve the system's performance and allow it to expand as processing needs evolve.

What is NUMA optimization?

NUMA optimization makes the VPC VM aware of the host NUMA architecture, enabling it to reduce memory latency by using local memory and thereby allocate more CPU cycles for packet processing.


2 Answers

Peter described the general JVM options available today to reduce the performance impact of NUMA architectures. To keep it short a NUMA aware JVM will partition the heap with respect to the NUMA nodes, and when a thread creates a new object, the object is allocated in the NUMA node of the core that runs that thread (if the same thread later uses it, the object will be in the local memory). Also when compacting the heap the NUMA aware JVM avoids moving large data chunks between nodes (and reduces the length of stop-the-world events).

So on any NUMA hardware and for any Java application the -XX:+UseNUMA option should probably be enabled.

But for ActivePivot that does not help much: ActivePivot is an in-memory database. There are real-time updates but the bulk of the data resides in the main memory for the life of the application. Whatever the JVM options, the data will be split among NUMA nodes, and the threads that execute queries will access memory randomly. Knowing that most sections of the ActivePivot query engine run as fast as memory can be fetched, the NUMA impact is particularly visible.

So how can you get the most from your ActivePivot solution on a NUMA hardware?

There is an easy solution when the ActivePivot application only uses a fraction of the resources (we find that it is often the case when several ActivePivot solutions run on the same server). For instance an ActivePivot solution that only uses 16 cores out of 64, and 256GB out of a TeraByte. In that case you can restrict the JVM process itself to a NUMA node.

On Linux you prefix the JVM launch with the following option ( http://linux.die.net/man/8/numactl ):

numactl --cpunodebind=xxx

If the entire server is dedicated to one ActivePivot solution, you can leverage the ActivePivot Distributed Architecture to partition the data. If there are 4 NUMA nodes, you start 4 JVMs hosting 4 ActivePivot nodes, each one bound to its NUMA node. With this deployment queries are distributed among the nodes, and each node will perform its share of the work at max performance, within the right NUMA node.

like image 81
Antoine CHAMBILLE Avatar answered Oct 19 '22 08:10

Antoine CHAMBILLE


You can try using -XX:+UseNUMA

http://docs.oracle.com/javase/7/docs/technotes/guides/vm/performance-enhancements-7.html

If this doesn't yield the result you expect you might have to use taskset to lock a JVM to a specific socket and effectively break the server into four machines with one JVM each.

I have observed that machine with more sockets have slower access to their memory (even their local memory) and how always give you the performance gains you want as a result.

like image 44
Peter Lawrey Avatar answered Oct 19 '22 06:10

Peter Lawrey