I'm planning to deploy several cassandra nodes using docker containers. If each node is inside a separate docker container, can I still build a cluster with these nodes?
I'm thinking I might have many problems because opening all the necessary ports might be difficult as some of them are random.
Be careful to all readers, both the question and the selected answer are wrong at many levels. Let me explain why.
Firstly, ports don't have to, and in most cases, are not random. Ports exposed by the container are defined in EXPOSE
instructions in the Dockerfile.
It is only if you decide to publish those ports to your host using the -P
option, that they are chosen randomly on the host. But usually ports are mapped manually using the lower case -p
option. Publishing ports on the host is only necessary if you want your container to be called from other hosts. This will be typically useful if you decide to create a Cassandra cluster on multiple hosts.
The question also failed to specify if the containers are to be on the same node, or spread across multiple nodes. These are completely different things to implement with Docker.
There is a Cassandra official image on Docker Hub. You usually better try the official image when there is one.
The suggested unofficial image has several flaws. I have not tested it, but I can guess from its Dockerfile and its init.sh script that:
The person who asked the question was most likely playing around, but I will try to answer the question seriously, because it's more interesting that way !
On a single host
Deploying Cassandra on a single host doesn't make much sense, because it's meant to be scaled horizontally. But if you are going to do it, you should still do it properly !
You should have multiple disks on your host if you want a good performance. The steps to do are:
Multiple Hosts - Single Cassandra node on each host
If you are going to have only a single container on each separate host, there is an easy solution which is to use --net=host
on your containers to make them use the host's network stack instead of putting them on a bridge network. This will make your container act as if they are the host, network-wise. They will have the same IP as their host and can be called by other hosts.
This technique is illustrated on a real cluster of physical nodes in this blog. The blog post also show what workarounds you can use if you don't want to use the host network and still want to use the default bridge network on each host.
Multiple Hosts - Multiple Cassandra nodes on each host
The easiest way to achieve this is to use an overlay network. This will make all containers across all nodes be on the same virtual network. They can then talk to each other transparently as if they were on the same node. But this requires using more advanced Docker tools, and deploying a key-value store service.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With