Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In a Kubernetes cluster, is there a way to migrate etcd from external to internal?

I made a Kubernetes cluster one year ago with an external etcd cluster (3 members).

A the time, I did not know that it was possible to make an etcd internal, so I made an external cluster and connected Kubernetes to it.

Now I am seeing that an internal cluster is a thing and it is a cleaner solution because the etcd nodes are updated when you update your Kubernetes cluster.

I can't find a clean solution to migrate an external etcd cluster to an internal cluster. I hope there is a solution with zero downtime. Do you know if it is possible please ?

Thank you for your response and have a nice day !

like image 973
Nurza Avatar asked Oct 27 '22 21:10

Nurza


1 Answers

As I can understand you have 3 etcd cluster members, external from Kubernetes cluster perspective. The expected outcome is to have all three members running on Kubernetes master nodes. There is some information left undisclosed, so I try to explain several possible options.

First of all, there are several reasonable ways to run etcd process to use as Kubernetes control-plane key-value storage:

  • etcd run as static pod, having startup configuration in /etc/kubernetes/manifests/etcd.yaml file
  • etcd run as a system service defined in /etc/systemd/system/etcd.service or similar file
  • etcd run as a docker container configured using command line options. (this solution is not really safe, unless you can make the contaner restarted after failure or host reboot)

For experimental purposes, you can also run etcd:

  • as a simple process in linux userspace
  • as a stateful set in the kubernetes cluster
  • as a etcd cluster managed by etcd-operator.

My personal recommendation is to have 5 members etcd cluster: 3 members runs as a static pods on 3 master kubernetes nodes and two more runs as static pods on external (Kubernetes cluster independent) hosts. In this case you will still have a quorum if you have at least one master node running or if you loose two external nodes by any reason.

There are at least two way to migrate etcd cluster from external instances to the Kubernetes cluster master nodes. It works in the opposite way too.

Migration

It's quite straighforward way to migrate the cluster. During this procedure members are turned off (one at a time), moved to another host and started again. Your cluster shouldn't have any problems while you still have quorum in the etcd cluster. My recommendation is to have at least 3 or better 5 nodes etcd cluster to make the migration safer. For bigger clusters it's may be more convenient to use the other solution from my second answer.

The process of moving etcd member to another IP address is described in the official documentation:

To migrate a member:

  1. Stop the member process.
  2. Copy the data directory of the now-idle member to the new machine.
  3. Update the peer URLs for the replaced member to reflect the new machine according to the runtime reconfiguration instructions.
  4. Start etcd on the new machine, using the same configuration and the copy of the data directory.

Now let's look closer on each step:

0.1 Ensure your etcd cluster is healthy and all members are in a good condition. I would recommend also checking the logs of all etcd members, just in case.

(To successfuly run the following commands please refer to step 3 for auth variables and aliases)

# last two commands only show you members specified by using --endpoints command line option
# the following commands is suppose to run with root privileges because certificates are not accessible by regular user

e2 cluster-health
e3 endpoint health
e3 endpoint status

0.2 Check each etcd member configuration and find out where etcd data-dir is located, then ensure that it will remain accessible after etcd process termination. In most cases it's located under /var/lib/etcd on the host machine and used directly or mounted as a volume to etcd pod or docker container.

0.3 Create a snapshot of each etcd cluster member, it's better don't use it, than don't have it.

1. Stop etcd member process.

If you use kubelet to start etcd, as recommended here, move etcd.yaml file out of /etc/kubernetes/manifests/. Right after that etcd Pod will be terminated by kubelet:

sudo mv /etc/kubernetes/manifests/etcd.yaml ~/
sudo chmod 644 ~/etcd.yaml 

In case if you start etcd process as a systemd service you can stop it using the following command:

sudo systemctl stop etcd-service-name.service

In case of docker container you can stop it using the following command:

docker ps -a 
docker stop <etcd_container_id>
docker rm <etcd_container_id>

If you run the etcd process from the command line, you can kill it using the following command:

kill `pgrep etcd`

2. Copy the data directory of the now-idle member to the new machine.

Not much complexity here. Compact etcd data-dir to the file and copy it to the destination instance. I also recommend to copy etcd manifest or systemd service configuration if you plan to run etcd on the new instance in the same way.

tar -C /var/lib -czf etcd-member-name-data.tar.gz etcd
tar -czf etcd-member-name-conf.tar.gz [etcd.yaml] [/etc/systemd/system/etcd.service]  [/etc/kubernetes/manifests/etcd.conf ...]
scp etcd-member-name-data.tar.gz destination_host:~/
scp etcd-member-name-conf.tar.gz destination_host:~/

3. Update the peer URLs for the replaced member to reflect the new member IP address according to the runtime reconfiguration instructions.

There are two way to do it, by using etcd API or by running etcdctl utility.

That's how etcdctl way may look like:
(replace etcd endpoints variables with the correct etcd cluster members ip addresses)

# all etcd cluster members should be specified
export ETCDSRV="--endpoints https://etcd.ip.addr.one:2379,https://etcd.ip.addr.two:2379,https://etcd.ip.addr.three:2379"
#authentication parameters for v2 and v3 etcdctl APIs
export ETCDAUTH2="--ca-file /etc/kubernetes/pki/etcd/ca.crt --cert-file /etc/kubernetes/pki/etcd/peer.crt --key-file  /etc/kubernetes/pki/etcd/peer.key"
export ETCDAUTH3="--cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/peer.crt --key=/etc/kubernetes/pki/etcd/peer.key"

# etcdctl API v3 alias
alias e3="ETCDCTL_API=3 etcdctl $ETCDAUTH3 $ETCDSRV"
# etcdctl API v2 alias
alias e2="ETCDCTL_API=2 etcdctl $ETCDAUTH2 $ETCDSRV"

# list all etcd cluster members and their IDs
e2 member list

e2 member update member_id http://new.etcd.member.ip:2380
#or
e3 member update member_id --peer-urls="https://new.etcd.member.ip:2380"

That's how etcd API way may look like:

export CURL_ETCD_AUTH="--cert /etc/kubernetes/pki/etcd/peer.crt --key /etc/kubernetes/pki/etcd/peer.key --cacert /etc/kubernetes/pki/etcd/ca.crt"

curl https://health.etcd.istance.ip:2379/v2/members/member_id -XPUT -H "Content-Type: application/json" -d '{"peerURLs":["http://new.etcd.member.ip:2380"]}' ${CURL_ETCD_AUTH}

4. Start etcd on the new machine, using the adjusted configuration and the copy of the data directory.

Unpack etcd data-dir on the new host:

tar -xzf etcd-member-name-data.tar.gz -C /var/lib/

Adjust etcd startup configuration according to your needs. At this point it's easy to select another way to run etcd. Depending on your choice prepare manifest or service definition file and replace there old ip address with new. E.g.:

sed -i  's/\/10.128.0.12:/\/10.128.0.99:/g' etcd.yaml

Now it's time to start etcd by moving etcd.yaml to /etc/kubernetes/manifests/, or by running the following command (if you run etcd as a systemd service)

sudo systemctl start etcd-service-name.service

5. Check updated etcd process logs and etcd cluster health to ensure that member is healthy.

To do that you can use the following commands:

$ e2 cluster-health

$ kubectl logs etct_pod_name -n kube-system

$ docker logs etcd_container_id 2>&1 | less

$ journalctl -e -u etcd_service_name     
like image 188
VASャ Avatar answered Nov 09 '22 12:11

VASャ