Handling errors while deploying Kubernetes cluster on VM cluster with Calico network

6 min readMay 19, 2020

My labmate (선배님) and I are trying to install Kubernetes for research. And we decided to first install it on a Virtual box machines cluster.

Virtual Box Machine cluster

First of all, we created a cluster in Virtual Box application. It includes a master node and 2 slave nodes. They are all running on Ubuntu 18.04. We use a network interface for local connect and a network interface to access to the Internet. My labmate helped me to create so I do not know well, actually.

Network interface of the master. enp0s3 for cluster internal connection and enp0s8 for internet connection.

Install kubeadm toolbox

You can find a comprehensive guide to install kubeadm provided by official kubernetes website here. Hence I will only mention some steps and errors which I encountered.

To eliminate as many errors as possible, I recommend you to check the requirement carefully provided on the website. Also, make sure the internet connection among nodes, between each node and the internet, the certificates.

Install kubeadm, kubelet and kubectl

We will install these packages on all of the machines:

kubeadm: the command to bootstrap the cluster.
kubelet: the component that runs on all of the machines in your cluster and does things like starting pods and containers.
kubectl: the command line util to talk to your cluster.
kubernetes-cni: Kubernetes uses CNI (Container Network Interface) as an interface between network providers and Kubernetes networking.

No valid OpenPGP data found

We start by adding key using the commands:

$ curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -

Sometimes, this command shows the error:

gpg: no valid OpenPGP data found

We can try with wget and --no-check-certificate to download the apt-key.gpg file and add the key file separately.

$ wget --no-check-certificate https://packages.cloud.google.com/apt/doc/apt-key.gpg
$ sudo apt-key add apt-key.gpg

Update repositories issue

We need to add software repositories (sources) to download and install necessary packages for the computers.

$ cat <<EOF | sudo tee /etc/apt/sources.list.d/kubernetes.list
deb https://apt.kubernetes.io/ kubernetes-xenial main
EOF$ apt-get update
$ apt-get install -y kubelet kubeadm kubectl kubernetes-cni

However, due to addresses redirection, some errors can occurs while calling update command. For example:

W: The repository 'https://apt.kubernetes.io kubernetes-xenial Release' does not have a Release file

Instead, we can specify the exact address as following.

$ cat <<EOF | sudo tee /etc/apt/sources.list.d/kubernetes.list
deb http://packages.cloud.google.com/apt/ kubernetes-xenial main
EOF$ apt-get update
$ apt-get install -y kubelet kubeadm kubectl kubernetes-cni

Start the cluster

Now, we can start to use kubeadm command, let initialize our cluster

$ sudo kubeadm init

There are some errors while checking preflight. Some people recommend to ignore preflight checking by adding flag --ignore-preflight-errors=<list-of-errors> . However, I strongly recommend not to ignore preflight errors in order to debug more easily if something goes wrong. Because, eventually, any error can be the reason for making the Kubernetes cluster not to work properly.

Permission denied while trying to connect to the Docker daemon socket

Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Get http://%2Fvar%2Frun%2Fdocker.sock/v1.40/containers/json: dial unix /var/run/docker.sock: connect: permission denied

The solution is to add user into docker group: https://docs.docker.com/engine/install/linux-postinstall/

[WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/

Just do as what it tells us.

Error Swap: running with swap on is not supported

[ERROR Swap]: running with swap on is not supported. Please disable swap

Use this command to turn off swap

$ sudo swapoff -a

x509: certificate signed by unknown authority

Sometimes, you may get this error

Get https://k8s.gcr.io/v2/: x509: certificate signed by unknown authority

The temporary solution is for insecure connection: https://docs.docker.com/registry/insecure/

There are also problems coming from network. Hence, make sure the network connection, the certificates, and check all nodes can connect to the internet.

Deployment cluster with Calico network

Applying Calico pod network

When kubeadm init finish properly, the output is like that.

$ sudo kubeadm init
...Your Kubernetes control-plane has initialized successfully!To start using your cluster, you need to run the following as a regular user:  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/configYou should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  https://kubernetes.io/docs/concepts/cluster-administration/addons/Then you can join any number of worker nodes by running the following on each as root:  kubeadm join 10.244.0.4:6443 --token 1l9bdg.yseidooa0w6q5h66 \
    --discovery-token-ca-cert-hash sha256:36cdd42f3bac72c9a22e2a3d40983af10ddd99e5d3b9a13104ed7dbcd9503da2

Just follow the instruction, after creating .kube folder and copying config, we need to select a pod network to the cluster. In this post, I select Calico. To use Calico, we should initialize cluster with flag --pod-network-cidr as the following example. You can find a quickstart guide on the Calico website.

$ sudo kubeadm init --pod-network-cidr=192.168.0.0/16
...
$ kubectl apply -f calico.yaml
...

And then, ssh to the slave nodes and join the cluster

kube@slave:$ sudo kubeadm join <control-plane-host>:<control-plane-port> --token <token> --discovery-token-ca-cert-hash sha256:<hash>

If everything works fine, the result will be like this

All nodes in Kubernetes cluster after joining slave nodes into cluster.

All pods in Kubernetes cluster after joining slave nodes into cluster.

Debugging deployment errors

Sometimes, there are some errors occurs. Kubernetes provides some commands to view the logs of the deployment which can be used to figure out the errors.

$ kubectl describe/logs -n kube-system

In my case, I could not create calico node properly and get this error:

Calico node 'node-name' is already using the IPv4 address X.X.X.X

The reason is that there are many network interfaces in the cluster, and I should define the interface on which the nodes can find each other. This can be done by specify network interface value for environment variable IP_AUTODETECTION_METHOD in the yaml file of the pod network.

Modifying IP_AUTODETECTION_METHOD environment variable

Now, the comprehensive command to initialize the cluster as follow:

$ sudo kubeadm init --pod-network-cidr=192.168.0.0/16 --apiserver-advertise-address=<address-of-master-node-on-the-interface-defined-on-IP-AUTODETECTION-METHOD>
...
$ kubectl apply -f calico.yaml
...

Finally, follow the instructions (in reference 2 or 4) to complete deployment.

Reset the cluster

Before reseting the cluster, we need to do some steps

$ sudo kubeadm reset
...
$ # Remove $HOME/.kube directory
...
$ # Remove CNI plugin in /opt/cni/bin and etc/cni/net.d
...

Restart the cluster

Every time the (physical or virtual) cluster restart, the Kubernetes cluster is down. However, we only need to restart kubelet to restart Kubernetes cluster.

$ # Turn off swap
$ sudo swapoff -a
$ # Restart kubelet
$ sudo systemctl restart kubelet

References

Install kubeadm guide: https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/install-kubeadm/
Creating a single control-plane cluster with kubeadm: https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/
Redirection issue: https://askubuntu.com/questions/1100800/kubernetes-installation-failing-ubuntu-16-04
Quickstart for Calico on Kubernetes: https://docs.projectcalico.org/getting-started/kubernetes/quickstart
Calico-node on worker nodes with ‘CrashLoopBackOff’: https://github.com/projectcalico/calico/issues/2720
Restart issue: https://stackoverflow.com/questions/51375940/kubernetes-master-node-is-down-after-restarting-host-machine