Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cassandra inside Docker

I'm planning to deploy several cassandra nodes using docker containers. If each node is inside a separate docker container, can I still build a cluster with these nodes?

I'm thinking I might have many problems because opening all the necessary ports might be difficult as some of them are random.

like image 469
To Do Avatar asked Dec 01 '22 01:12

To Do


1 Answers

Be careful to all readers, both the question and the selected answer are wrong at many levels. Let me explain why.

Why the question is wrong

Firstly, ports don't have to, and in most cases, are not random. Ports exposed by the container are defined in EXPOSE instructions in the Dockerfile.

It is only if you decide to publish those ports to your host using the -P option, that they are chosen randomly on the host. But usually ports are mapped manually using the lower case -p option. Publishing ports on the host is only necessary if you want your container to be called from other hosts. This will be typically useful if you decide to create a Cassandra cluster on multiple hosts.

The question also failed to specify if the containers are to be on the same node, or spread across multiple nodes. These are completely different things to implement with Docker.

Why the answer is wrong

There is a Cassandra official image on Docker Hub. You usually better try the official image when there is one.

The suggested unofficial image has several flaws. I have not tested it, but I can guess from its Dockerfile and its init.sh script that:

  • It only supports 10 containers (this is explained by the author)
  • It relies on container linking, which is only for containers on the same host, and deprecated now anyways
  • It does not have the necessary options to configure the seeds, the broadcast IP, etc.. which you will need when deploying containers on separate hosts.
  • It does not have a mount point for the data. This means that if you have 10 containers, they must write to the same disk, which would ruin IO performance. Cassandra's strength is that it doesn't do random reads to avoid disk seek time.

So how to use Cassandra with Docker ?

The person who asked the question was most likely playing around, but I will try to answer the question seriously, because it's more interesting that way !

On a single host

Deploying Cassandra on a single host doesn't make much sense, because it's meant to be scaled horizontally. But if you are going to do it, you should still do it properly !

You should have multiple disks on your host if you want a good performance. The steps to do are:

  • Use the official image and mount a separate disk on each container.
  • Create a user-defined bridge network and put these containers on that same network so that they can communicate with each other, and even have automatic hostname resolution.
  • After the first container is creates, pass the CASSANDRA_SEEDS variable to each subsequent container to inform it of the IP or hostname of any other container already in the cluster to join.

Multiple Hosts - Single Cassandra node on each host

If you are going to have only a single container on each separate host, there is an easy solution which is to use --net=host on your containers to make them use the host's network stack instead of putting them on a bridge network. This will make your container act as if they are the host, network-wise. They will have the same IP as their host and can be called by other hosts.

This technique is illustrated on a real cluster of physical nodes in this blog. The blog post also show what workarounds you can use if you don't want to use the host network and still want to use the default bridge network on each host.

Multiple Hosts - Multiple Cassandra nodes on each host

The easiest way to achieve this is to use an overlay network. This will make all containers across all nodes be on the same virtual network. They can then talk to each other transparently as if they were on the same node. But this requires using more advanced Docker tools, and deploying a key-value store service.

like image 55
Nicomak Avatar answered Dec 28 '22 08:12

Nicomak