Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cassandra: what is the correct configuration for EC2 multi-region?

What is the correct configuration for a mulit-region setup in EC2 instances?

What should the listen_address, broadcast_address, rpc_address and seed ip/addresses be to work?

When do you use public IP address and when do you use private IP addresses?

like image 256
Andrew Avatar asked Sep 10 '13 07:09

Andrew


People also ask

Does Cassandra supports multiple datacenters out of the box?

Cassandra is designed as a distributed system, for deployment of large numbers of nodes across multiple data centers. Key features of Cassandra's distributed architecture are specifically tailored for multiple-data center deployment.

How does Cassandra guarantee high availability per cluster?

Cassandra guarantees high availability of data by implementing a fault-tolerant storage system. Failure detection in a node is detected using a gossip-based protocol.


1 Answers

According to the docs:

broadcast_address: (Default: listen_address) If your Cassandra cluster is deployed across multiple Amazon EC2 regions and you use the EC2MultiRegionSnitch, set the broadcast_address to public IP address of the node and the listen_address to the private IP.

listen_address: (Default: localhost) The IP address or hostname that other Cassandra nodes use to connect to this node. If left unset, the hostname must resolve to the IP address of this node using/etc/hostname, /etc/hosts, or DNS. Do not specify 0.0.0.0.

rpc_address: (Default: localhost) The listen address for client connections (Thrift remote procedure calls).

seed_provider: (Default: org.apache.cassandra.locator.SimpleSeedProvider) A list of comma-delimited hosts (IP addresses) to use as contact points when a node joins a cluster. Cassandra also uses this list to learn the topology of the ring. When running multiple nodes, you must change the - seeds list from the default value (127.0.0.1). In multiple data-center clusters, the - seeds list should include at least one node from each data center (replication group)

Trying to summarize:

  1. the rpc_address is used for client connections and has nothing to do with multi-region EC2
  2. the listen_address and broadcast_address are the 2 important options for multi-region EC2 configuration
  3. in general when configuring any of these answer 2 questions:

    1. who is connecting? (another nodes? clients?)
    2. what IPs are accessible? (is this network interface accessible to who is connecting?)
like image 165
Alex Popescu Avatar answered Sep 20 '22 22:09

Alex Popescu