Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cassandra two nodes with redundancy

Tags:

cassandra

I have setup two servers to both run Cassandra by following the documentation in the DataStax website. My current setup is to have

1 seed node (configured in both yamls)

When running, both nodes are up (when testing via nodetool) and both seem to have the data replicated correctly but I've noticed that when I bring down the seed node, the other node doesn't allow client connections (neither via their API or by connecting to cqlsh) which is a problem.

My requirement is to have two servers which are perfect replicas of each other and in case one server is temporary down (due to disk space failures for instance), the other server can take over the requests until the broken server comes back online.

Given this requriement, I have the following questions:

  1. Do I need to setup both nodes as "seed" nodes?
  2. How would I make sure everything is replicated across both servers? Does this happen automatically or is there some setting somewhere I need to set?

Many thanks in advance,

like image 352
kha Avatar asked Jan 05 '15 11:01

kha


Video Answer


1 Answers

Cassandra does not do master-slave replication. There is no master in cassandra. Rather, data is distributed across the cluster. The distribution mechanism depends on a number of things.

Data is stored on nodes in partitions. Remember cassandra is a partitioned row store? That's where partitions come in. Data is stored in partitions. All rows for a partition are stored together in a single node (and replicas). How many replicas depends on the table's replication factor. If the replication factor is 3 for a table, each partition for that table (and as such, all rows in that partition) are stored in two additional replicas. It's like saying - "I want 3 copies of this data".

During writing, clients can specify a consistency level (CL). This is the number of nodes that must acknowledge for a write to be successful. Clients can specify a CL for reading too. Cassandra issues read requests to n=CL nodes, and takes the most recent value as the query result.

By tuning read and write CLs, you control consistency. If Read CL + Write CL > Replication factor (RF), you get full consistency.

In terms of fault tolerance, you can tweak CLs and RF to be what you need. For example, if you have RF=3, Read CL=2, Write CL = 2, then you have full consistency, and you can tolerate one node going down. For RF=5, Read CL=3, Write CL=3, you have the same, but can tolerate 2 nodes going down.

A two node cluster, is not really a good idea. You can set RF=2 (all data replicated), write CL=2 and read CL =1. However, this will mean if a node is down, you can only read but not write. You can set read CL=2 and write CL=1, in which case, if a node goes down, you can write but not read. Realistically, you should go for at least 5 (at the very least 4) nodes with a RF=3. Any lower than that, and you're asking for trouble.

like image 61
ashic Avatar answered Sep 28 '22 00:09

ashic