Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how do i know if nodetool repair is finished

Tags:

I have a 2 node apache cassandra (2.0.3) cluster with rep factor of 1. I change rep factor to 2 using the following command in cqlsh

ALTER KEYSPACE "mykeyspace" WITH REPLICATION =   { 'class' : 'SimpleStrategy', 'replication_factor' : 2 }; 

I then tried to run recommended "nodetool repair" after doing this type of alter.

The problem is that this command sometimes finishes very quickly. When it does finishes like that it will normally say 'Lost notification...' and exit code is not zero.

So I just repeat this 'nodetool repair' until it finishes without error. I also check that 'nodetool status' reports expected disk space for each node. (with rep factor 1, each node has say about 7GB each and I expect after nodetool repair that each is 14GB each assuming no cluster usage in the mean time)

Is there a more correct way to determine that 'nodetool repair' is finished in this case?

like image 739
user3865568 Avatar asked Jul 31 '14 16:07

user3865568


People also ask

How does Nodetool repair work?

Repairs one or more tables. The repair command repairs one or more nodes in a cluster, and provides options for restricting repair to a set of nodes, see Repairing nodes. Performing an anti-entropy node repair on a regular basis is important, especially in an environment that deletes data frequently.

How do I run a full repair in Cassandra?

Log in to the server where a Cassandra node is installed. Go to the <install_dir>/apache-cassandra/bin directory. Type ./nodetool repair -local to run an incremental, parallel repair. Add -full -local to run a full repair.


2 Answers

Generally speaking, you can monitor a nodetool repair operation with two nodetool commands:

  • compactionstats
  • netstats

The repair operation has two distinct phases. First it calculates the differences between the nodes (repair work to be done), and then it acts on those differences by streaming data to the appropriate nodes.

This checks on the active Merkle Tree calculations:

$ nodetool compactionstats pending tasks: 0 Active compaction remaining time :        n/a 

The repair streams can be monitored by:

$ nodetool netstats 

In fact, TheLastPickle's Aaron Morton suggests using the following Bash script/command to monitor any active repair streams:

while true; do date; diff <(nodetool -h localhost netstats) <(sleep 5 && nodetool -h localhost netstats); done 

DataStax has a posting in their support forums about troubleshooting hanging repairs. If you have any hung repair streams, you should be able to see them with a netstats. This can happen if one of your nodes becomes unavailable during the repair process. To monitor the specific repair operations, you can check your log file for entries like this:

DEBUG [WRITE-/172.30.77.197] 2013-05-03 12:43:09,107 OutboundTcpConnection.java (line 165) error writing to /172.30.77.197 java.net.SocketException: Connection reset

Note that repair sessions should also be denoted in your system.log:

[repair #02fc68f0-210c-11e7-aa88-c35a9a02c19a] Starting...  [repair #02fc68f0-210c-11e7-aa88-c35a9a02c19a] Completed... 
like image 61
Aaron Avatar answered Oct 05 '22 06:10

Aaron


The repair streams can be monitored with option --trace when you start repair command:

nodetool repair --trace <key_space> <table>

like image 28
tjeubaoit Avatar answered Oct 05 '22 06:10

tjeubaoit