How to prevent Cassandra commit logs filling up disk space

Tags:

I'm running a two node Datastax AMI cluster on AWS. Yesterday, Cassandra started refusing connections from everything. The system logs showed nothing. After a lot of tinkering, I discovered that the commit logs had filled up all the disk space on the allotted mount and this seemed to be causing the connection refusal (deleted some of the commit logs, restarted and was able to connect).

I'm on DataStax AMI 2.5.1 and Cassandra 2.1.7

If I decide to wipe and restart everything from scratch, how do I ensure that this does not happen again?

730

asked Jul 30 '15 20:07

plamb

2 Answers

You could try lowering the commitlog_total_space_in_mb setting in your cassandra.yaml. The default is 8192MB for 64-bit systems (it should be commented-out in your .yaml file... you'll have to un-comment it when setting it). It's usually a good idea to plan for that when sizing your disk(s).

You can verify this by running a du on your commitlog directory:

$ du -d 1 -h ./commitlog
8.1G    ./commitlog

Although, a smaller commit log space will cause more frequent flushes (increased disk I/O), so you'll want to keep any eye on that.

Edit 20190318

Just had a related thought (on my 4-year-old answer). I saw that it received some attention recently, and wanted to make sure that the right information is out there.

It's important to note that sometimes the commit log can grow in an "out of control" fashion. Essentially, this can happen because the write load on the node exceeds Cassandra's ability to keep up with flushing the memtables (and thus, removing old commitlog files). If you find a node with dozens of commitlog files, and the number seems to keep growing, this might be your issue.

Essentially, your memtable_cleanup_threshold may be too low. Although this property is deprecated, you can still control how it is calculated by lowering the number of memtable_flush_writers.

memtable_cleanup_threshold = 1 / (memtable_flush_writers + 1)

The documentation has been updated as of 3.x, but used to say this:

# memtable_flush_writers defaults to the smaller of (number of disks,
# number of cores), with a minimum of 2 and a maximum of 8.
# 
# If your data directories are backed by SSD, you should increase this
# to the number of cores.
#memtable_flush_writers: 8

...which (I feel) led to many folks setting this value WAY too high.

Assuming a value of 8, the memtable_cleanup_threshold is .111. When the footprint of all memtables exceeds this ratio of total memory available, flushing occurs. Too many flush (blocking) writers can prevent this from happening expediently. With a single /data dir, I recommend setting this value to 2.

199

answered Sep 23 '22 13:09

Aaron

In addition to decreasing the commitlog size as suggested by BryceAtNetwork23, a proper solution to ensure it won't happen again will have monitoring of the disk setup so that you are alerted when its getting full and have time to act/increase the disk size.

Seeing as you are using DataStax, you could set an alert for this in OpsCenter. Haven't used this within the cloud myself, but I imagine it would work. Alerts can be set by clicking Alerts in the top banner -> Manage Alerts -> Add Alert. Configure the mounts to watch and the thresholds to trigger on.

Or, I'm sure there are better tools to monitor disk space out there.

answered Sep 25 '22 13:09

Alec Collier

Related questions
                            
                                What are best practices for backing up a cassandra cluster?
                            
                                Columnar storage: Cassandra vs Redshift
                            
                                Cassandra Installation
                            
                                Basics of Hector & Cassandra
                            
                                CQL SELECT greater-than query on indexed non-key column
                            
                                Understanding Cassandra's storage overhead
                            
                                how to filter cassandra query by a field in user defined type
                            
                                How to get last inserted row in Cassandra?
                            
                                Error while installing cassandra
                            
                                Exception encountered during startup: cdc_raw_directory is missing and -Dcassandra.storagedir is not set
                            
                                how to make a subselect in cassandra db
                            
                                Cassandra: Difference b/w TEXT(VARCHAR) and ASCII
                            
                                What is the optimal way to model one-to-many relationships in Cassandra?
                            
                                Using Cassandra for OLAP
                            
                                Cassandra mutual exclusion locking (synchronization)
                            
                                Fluent Cassandra vs Aquiles?
                            
                                Analytics - mongodb or cassandra
                            
                                How important is it to enable read repair in Cassandra?
                            
                                Cassandra for storing payment information
                            
                                Best practices for cleaning up Cassandra incremental backup folders

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to prevent Cassandra commit logs filling up disk space

Tags:

cassandra

datastax-java-driver

datastax

cassandra-2.1

plamb

People also ask

2 Answers

Aaron

Alec Collier

Recent Activity

Donate For Us