Cassandra keyspace does not propagate to newly added node (after previous successful additions and removals)

Tags:

cassandra

I was testing rotating through a 4 node cluster, adding and removing nodes in a cyclic manner so the members of the cluster adhered to the following repeating sequence

Node addition was performed by stopping cassandra, wiping /var/lib/cassandra/*, and restarting cassandra (with the same cassandra.yaml file, which listed nodes 1 and 2 as seeds). Node removal was performed by stopping cassandra and then issueing nodetool removenode $nodeId from another node. In all cases, the next operation was not started until the previous one was completed.

The above sequence of node members repeated several times until after about 4 iterations I was performing an "add node" operation to transtion from a cluster of nodes {1, 2} to a cluster of nodes {1, 2, 3}. On this iteration, my custom keyspace failed to propagate to node 3. Nodetool status looked fine:

$ nodetool status
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address         Load       Tokens  Owns (effective)  Host ID                               Rack
UN  192.168.12.206  164.88 KB  256     66.2%             7018ef8a-af08-40e9-b3d3-065f4ba6eb0d  rack1
UN  192.168.12.207  60.85 KB   256     63.2%             ff18b636-6287-4c70-bf23-0a1a1814b864  rack1
UN  192.168.12.205  217.19 KB  256     70.6%             2bc38fa8-42a1-457f-84d7-35b3b46e1daa  rack1

But cqlsh on node 3 didn't know about my keyspace. I tried to run nodetool repair, which seemed to loop infinitely, while spewing the following couple of stacks in the log:

WARN [Thread-9781] 2014-09-16 19:34:30,081 IncomingTcpConnection.java (line 83) UnknownColumnFamilyException reading from socket; closing
org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find cfId=08768b1d-97a1-3528-8191-9acee7b08ef4
        at org.apache.cassandra.db.ColumnFamilySerializer.deserializeCfId(ColumnFamilySerializer.java:178)
        at org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamilySerializer.java:103)
        at org.apache.cassandra.service.paxos.Commit$CommitSerializer.deserialize(Commit.java:145)
        at org.apache.cassandra.service.paxos.Commit$CommitSerializer.deserialize(Commit.java:134)
        at org.apache.cassandra.net.MessageIn.read(MessageIn.java:99)
        at org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:153)
        at org.apache.cassandra.net.IncomingTcpConnection.handleModernVersion(IncomingTcpConnection.java:130)
        at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:74)
ERROR [Thread-9782] 2014-09-16 19:34:31,484 CassandraDaemon.java (line 199) Exception in thread Thread[Thread-9782,5,main]
java.lang.NullPointerException
        at org.apache.cassandra.db.RangeSliceCommandSerializer.deserialize(RangeSliceCommand.java:247)
        at org.apache.cassandra.db.RangeSliceCommandSerializer.deserialize(RangeSliceCommand.java:156)
        at org.apache.cassandra.net.MessageIn.read(MessageIn.java:99)
        at org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:153)
        at org.apache.cassandra.net.IncomingTcpConnection.handleModernVersion(IncomingTcpConnection.java:130)
        at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:74)

Any ideas what's going on and how to fix this (ideally, a reliable working repair and a way to avoid entering this state in the first place)?

909

asked Sep 17 '14 02:09

jonderry

1 Answers

If there is a schema version disagreement you can tell by running nodetool describecluster

If you are seeing different versions in one node do the following node that has the wrong version:

stop the Cassandra service/process, typically by running: nodetool drain

sudo service cassandra stop or kill <pid>. At the end of this process the commit log directory (/var/lib/cassandra/commitlog) should contain only a single small file.

Remove the Schema* and Migration* sstables inside of your system keyspace (/var/lib/cassandra/data/system, if you're using the defaults).

After starting Cassandra again, this node will notice the missing information and pull in the correct schema from one of the other nodes. In version 1.0.X and before the schema is applied one mutation at a time. While it is being applied the node may log messages, such as the one below, that a Column Family cannot be found. These messages can be ignored.

ERROR [MutationStage:1] 2012-05-18 16:23:15,664 RowMutationVerbHandler.java (line 61) Error in row mutation
org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find cfId=1012
To confirm everything is on the same schema, verify that 'describe cluster;' only returns one schema version.

Source: https://wiki.apache.org/cassandra/FAQ

163

answered Oct 12 '22 23:10

phact

Related questions
                            
                                How to migrate single-token cluster to a new vnodes cluster without downtime?
                            
                                Cassandra - Cqlengine - TTL Support
                            
                                Hbase vs Cassandra: Which is better for a timeseries data storage?
                            
                                Confused on Cassandra terminology (row vs partition)
                            
                                Is it possible to submit a CQL script to a cassandra cluster via the datastax driver?
                            
                                What does "nodetool compact" do for DateTieredCompactionStrategy?
                            
                                Spring data cassandra - Error creating bean with name 'cassandraSession': Invocation of init method failed
                            
                                Cassandra Moving Data to Another New Cassandra node -
                            
                                Cassandra operation timed out
                            
                                E: Package 'cassandra' has no installation candidate
                            
                                Best practice modeling data for Cassandra databases
                            
                                Flink job with CassandrSink fails with Error writing
                            
                                Selective get in cassandra faster than normal get?
                            
                                Run analytics on huge MySQL database
                            
                                Cassandra: Design Data Model for User, Roles and Permissions
                            
                                Getting correct timestamp from cassandra using datastax python-driver
                            
                                Cassandra java ORM [closed]
                            
                                Titan geo data on Cassandra
                            
                                Cassandra: Load large data fast
                            
                                Cassandra - avoid nodetool cleanup

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With