Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Preventing Cassandra from dumping hprof files

I would like to stop Cassandra from dumping hprof files as I do not require the use of them.

I also have very limited disk space (50GB out of 100 GB is used for data), and these files swallow up all the disk space before I can say "stop".

How should I go about it?

Is there a shell script that I could use to erase these files from time to time?

like image 399
Salocin.TEN Avatar asked Feb 02 '12 06:02

Salocin.TEN


3 Answers

It happens because Cassandra starts with -XX:+HeapDumpOnOutOfMemoryError Java option. Which is good stuff if you want to analyze. Also, if you are getting lots of heap-dump that indicate that you should probably tune the memory available to Cassandra.

I haven't tried it. But to block this option, comment the following line in $CASSANDRA_HOME/conf/cassandra-env.sh

JVM_OPTS="$JVM_OPTS -XX:+HeapDumpOnOutOfMemoryError"

Optionally, you may comment this block as well, but not really required, I think. This block is available in 1.0+ version I guess. I can't find this in 0.7.3.

# set jvm HeapDumpPath with CASSANDRA_HEAPDUMP_DIR
if [ "x$CASSANDRA_HEAPDUMP_DIR" != "x" ]; then
    JVM_OPTS="$JVM_OPTS -XX:HeapDumpPath=$CASSANDRA_HEAPDUMP_DIR/cassandra-`date +%s`-pid$$.hprof"
fi

Let me know if this worked.


Update

...I guess it is JVM throwing it out when Cassandra crashes / shuts down. Any way to prevent that one from happening?

If you want to disable JVM heap-dump altogether, see here how to disable creating java heap dump after VM crashes?

like image 183
Nishant Avatar answered Sep 19 '22 05:09

Nishant


I'll admit i haven't used Cassandra, but from what i can tell, it shouldn't be dumping any hprof files unless you enable it at compile time, or the program experiences an OutofMemoryException. So try looking there.

in terms of a shell script, if the files are being dumped to a specific location you can use this command to delete all *.hprof files.

find /my/location/ -name *.hprof -delete

this is using the -delete directive from find that deletes all files that match the search. Look at the man page for find for more search options if you need to narrow it down more.

You can use cron to run a script at a given time, which would satisfy your "time to time" requirement, most linux distros have a cron installed, and work off of a crontab file. You can find out more about the crontab by using man crontab

like image 37
Aatch Avatar answered Sep 17 '22 05:09

Aatch


Even if you update cassandra-env.sh to point to the heapdump path it will still not work. The reason was that from the upstart script /etc/init.d/cassandra there is this line which creates the default HeapDump path

start-stop-daemon -S -c cassandra -a /usr/sbin/cassandra -b -p "$PIDFILE" -- \
    -p "$PIDFILE" -H "$heap_dump_f" -E "$error_log_f" >/dev/null || return 2

I'm not an upstart expert but what I did was just removed the param which creates the duplicate. Another weird observation also when checking cassandra process via ps aux you'll notice that you'll see some parameters being written twice. If you source cassandra-env.sh and print $JVM_OPTS you'll notice those variables okay.

like image 44
Superpaul Avatar answered Sep 20 '22 05:09

Superpaul