Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Optimal JVM settings for Cassandra

I have a 4 node cluster with 16 core CPU and 100 GB RAM on each box (2 nodes on each rack).

As of now, all are running with default JVM settings of Cassandra (v2.1.4). With this setting, each node uses 13GB RAM and 30% CPU. It is a write heavy cluster with occasional deletes or updates.

Do I need to tune the JVM settings of Cassandra to utilize more memory? What all things should I be looking at to make appropriate settings?

like image 271
dvl Avatar asked May 13 '15 07:05

dvl


People also ask

What is JVM in cassandra?

The jvm- files replace the cassandra-envsh file used in Cassandra versions prior to Cassandra 3.0. The cassandra-env.sh bash script file is still useful if JVM settings must be dynamically calculated based on system settings. The jvm- files only store static JVM settings.

What is off heap memory in cassandra?

Offheap is manually managed memory, which is used for: Bloom filters: Used to quickly test if a SSTable contains a partition. Index summary: A search lookup of index positions. Compression metadata.

How do you set the maximum and minimum value of JVM memory?

Set the memory available to the JVMTag(s): Environment-Xmx<size> the maximum Java heap size. The default value for the minimum is 2Mb, for the maximum it's 64Mb.


2 Answers

Do I need to tune the JVM settings of Cassandra to utilize more memory?

The DataStax Tuning Java Resources doc actually has some pretty sound advice on this:

Many users new to Cassandra are tempted to turn up Java heap size too high, which consumes the majority of the underlying system's RAM. In most cases, increasing the Java heap size is actually detrimental for these reasons:

  • In most cases, the capability of Java to gracefully handle garbage collection above 8GB quickly diminishes.
  • Modern operating systems maintain the OS page cache for frequently accessed data and are very good at keeping this data in memory, but can be prevented from doing its job by an elevated Java heap size.

If you have more than 2GB of system memory, which is typical, keep the size of the Java heap relatively small to allow more memory for the page cache.

As you have 100GB of RAM on your machines, (if you are indeed running under the "default JVM settings") your JVM max heap size should be capped at 8192M. And actually, I wouldn't deviate from that that unless you are experiencing issues with garbage collection.

JVM resources for Cassandra can be set in the cassandra-env.sh file. If you are curious, look at the code for cassandra-env.sh and look for the calculate_heap_sizes() method. That should give you some insight as to how Cassandra computes your default JVM settings.

What all things should I be looking at to make appropriate settings?

If you are running OpsCenter (and you should be), add a graph for "Heap Used" and "Non Heap Used."

OpsCenter graphing Heap Used and Non Heap Used together

This will allow you to easily monitor JVM heap usage for your cluster. Another thing that helped me, was to write a bash script in which I basically hijacked the JVM calculations from cassandra-env.sh. That way I can run it on a new machine, and know right away what my MAX_HEAP_SIZE and HEAP_NEWSIZE are going to be:

#!/bin/bash
clear
echo "This is how Cassandra will determine its default Heap and GC Generation sizes."

system_memory_in_mb=`free -m | awk '/Mem:/ {print $2}'`
half_system_memory_in_mb=`expr $system_memory_in_mb / 2`
quarter_system_memory_in_mb=`expr $half_system_memory_in_mb / 2`

echo "   memory = $system_memory_in_mb"
echo "     half = $half_system_memory_in_mb"
echo "  quarter = $quarter_system_memory_in_mb"

echo "cpu cores = "`egrep -c 'processor([[:space:]]+):.*' /proc/cpuinfo`

#cassandra-env logic duped here
#this should help you to see how much memory is being allocated
#to the JVM
    if [ "$half_system_memory_in_mb" -gt "1024" ]
    then
        half_system_memory_in_mb="1024"
    fi
    if [ "$quarter_system_memory_in_mb" -gt "8192" ]
    then
        quarter_system_memory_in_mb="8192"
    fi
    if [ "$half_system_memory_in_mb" -gt "$quarter_system_memory_in_mb" ]
    then
        max_heap_size_in_mb="$half_system_memory_in_mb"
    else
        max_heap_size_in_mb="$quarter_system_memory_in_mb"
    fi
    MAX_HEAP_SIZE="${max_heap_size_in_mb}M"

    # Young gen: min(max_sensible_per_modern_cpu_core * num_cores, 1/4 * heap size)
    max_sensible_yg_per_core_in_mb="100"
    max_sensible_yg_in_mb=`expr ($max_sensible_yg_per_core_in_mb * $system_cpu_cores)`

    desired_yg_in_mb=`expr $max_heap_size_in_mb / 4`
    if [ "$desired_yg_in_mb" -gt "$max_sensible_yg_in_mb" ]
    then
        HEAP_NEWSIZE="${max_sensible_yg_in_mb}M"
    else
        HEAP_NEWSIZE="${desired_yg_in_mb}M"
    fi

echo "Max heap size = " $MAX_HEAP_SIZE
echo " New gen size = " $HEAP_NEWSIZE

Update 20160212:

Also, be sure to check-out Amy Tobey's 2.1 Cassandra Tuning Guide. She has some great tips on how to get your cluster running optimally.

like image 144
Aaron Avatar answered Sep 23 '22 20:09

Aaron


system_cpu_cores is not set properly. Edited the right one to execute.

#!/bin/bash
clear
echo "This is how Cassandra will determine its default Heap and GC Generation sizes."

system_memory_in_mb=`free -m | awk '/Mem:/ {print $2}'`
half_system_memory_in_mb=`expr $system_memory_in_mb / 2`
quarter_system_memory_in_mb=`expr $half_system_memory_in_mb / 2`
system_cpu_cores=`cat /proc/cpuinfo   | grep -i processor | wc -l`
echo "   memory = $system_memory_in_mb"
echo "     half = $half_system_memory_in_mb"
echo "  quarter = $quarter_system_memory_in_mb"

echo "cpu cores = `egrep -c 'processor([[:space:]]+):.*' /proc/cpuinfo`"

#cassandra-env logic duped here
#this should help you to see how much memory is being allocated
#to the JVM
if [ "$half_system_memory_in_mb" -gt "1024" ]
then
    half_system_memory_in_mb="1024"
fi
if [ "$quarter_system_memory_in_mb" -gt "8192" ]
then
    quarter_system_memory_in_mb="8192"
fi
if [ "$half_system_memory_in_mb" -gt "$quarter_system_memory_in_mb" ]
then
    max_heap_size_in_mb="$half_system_memory_in_mb"
else
    max_heap_size_in_mb="$quarter_system_memory_in_mb"
fi
MAX_HEAP_SIZE="${max_heap_size_in_mb}M"

# Young gen: min(max_sensible_per_modern_cpu_core * num_cores, 1/4 * heap size)
max_sensible_yg_per_core_in_mb="100"
max_sensible_yg_in_mb=`expr $max_sensible_yg_per_core_in_mb * $system_cpu_cores`
desired_yg_in_mb=`expr $max_heap_size_in_mb / 4`
if [ "$desired_yg_in_mb" -gt "$max_sensible_yg_in_mb" ]
then
    HEAP_NEWSIZE="${max_sensible_yg_in_mb}M"
else
    HEAP_NEWSIZE="${desired_yg_in_mb}M"
fi

echo "Max heap size = " $MAX_HEAP_SIZE
echo " New gen size = " $HEAP_NEWSIZE
like image 29
mannoj Avatar answered Sep 21 '22 20:09

mannoj