Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Way to determine healthy Cassandra cluster?

I've been tasked with re-writing some sub-par Ansible playbooks to stand up a Cassandra cluster in CentOS. Quite frankly, there doesn't seem to be much information on Cassandra out there.

I've managed to get the service running on all three nodes at the same time, using the following configuration file, info scrubbed.

HOSTIP=10.0.0.1
MSIP=10.10.10.10
[email protected]
LICENSE_FILE=/tmp/license.conf
USE_LDAP_REMOTE_HOST=n

ENABLE_AX=y
MP_POD=gateway

REGION=test-1

USE_ZK_CLUSTER=y
ZK_HOSTS="10.0.0.1 10.0.0.2 10.0.0.3"
ZK_CLIENT_HOSTS="10.0.0.1 10.0.0.2 10.0.0.3"

USE_CASS_CLUSTER=y
CASS_HOSTS="10.0.0.1:1,1 10.0.0.2:1,1 10.0.0.3:1,1"
CASS_USERNAME=test
CASS_PASSWORD=test

The HOSTIP changes depending on which node the configuration file is on.

The problem is, when I run nodetool ring, each node says there's only two nodes in the cluster: itself and one other, seemingly random from the other two.

What are some basic sanity checks to determine a "healthy" Cassandra cluster? Why is nodetool saying each one thinks there's a different node missing from the cluster?

like image 358
Roman Avatar asked Oct 23 '25 19:10

Roman


1 Answers

Actually, the thing you really want to check is if all the nodes "AGREE" on schema_id. nodetool status shows if nodes or up, down, joining, yet it does not really mean 'healthy' enough to make schema changes or do other changes. The simplest way is: nodetool describecluster

Cluster Information:
        Name: FooBarCluster
        Snitch: org.apache.cassandra.locator.GossipingPropertyFileSnitch
        DynamicEndPointSnitch: enabled
        Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
        Schema versions:
                43fe9177-382c-327e-904a-c8353a9df590: [10.136.2.1, 10.136.2.2, 10.136.2.3]

If schema IDs do not match, you need to wait for schema to settle, or run repairs, say for example like this:

43fe9177-382c-327e-904a-c8353a9df590: [10.136.2.1, 10.136.2.2]
43fe9177-382c-327e-904a-c8353a9dxxxx: [10.136.2.3]

However, running nodetool is 'heavy' and hard to parse.

The information is inside the database, you can check here:

'SELECT schema_version, release_version FROM  system.local' and
'SELECT peer, schema_version, release_version FROM system.peers'

Then you compare schema_version across all nodes... if they match, the cluster is very likely healthy. You should ALWAYS check this before making any changes to schema.

Now, during a rolling upgrade, when changing engine versions, the release_version is different, so to support automatic rolling upgrades, you need to check schema_id matching within release_versions separately.

like image 111
Taveren Tech Avatar answered Oct 27 '25 00:10

Taveren Tech