If I understand it correctly, upon a write request the write is sent to all N replicas, and the operation succeeds when the first W responses are received. Is this correct?
If it is, then combined with Hinted Handoff, it seems that all replicas will already get all writes as soon as possible, do we really have to do read repair in this case?
Thanks.
Repair synchronizes the data between nodes by comparing their respective datasets for their common token ranges, and streaming the differences for any out of sync sections between the nodes. It compares the data with merkle trees, which are a hierarchy of hashes.
Read Repair is the process of repairing data replicas during a read request. If all replicas involved in a read request at the given read consistency level are consistent the data is returned to the client and no read repair is needed.
Most times read performance when using Cassandra gets decreased when some operations are done wrongly such as index interval, bloom filter false positive, consistency level, read repair chance, caching, compaction, data modeling and cluster deployment.
Ongoing repair operations in Cassandra ensure that all replicas of a row will eventually be consistent. Repairs work to decrease the variability in replica data, but constant data traffic through a widely distributed system can lead to inconsistency (stale data) at any given time.
Short answer: you still need read repair.
Longer answer: there wasn't a good discussion of Hinted Handoff anywhere, so I wrote one.
For Cassandra 1.0+, read the updated article. The crucial part being:
At first glance, it may appear that Hinted Handoff lets you safely get away without needing repair. This is only true if you never have hardware failure.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With