前提

建议先看一下错误是否一致,再看解决方案。由于我是初学者,在大量的百度之后也学会了一些排查方式。
点我直接看结果

环境
腾讯云centos7

启动命令
kubeadm init
–apiserver-advertise-address=ip
–kubernetes-version v1.18.0
–service-cidr=10.96.0.0/12
–pod-network-cidr=10.244.0.0/16

排查过程

首先这是失败的报错

[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[kubelet-check] Initial timeout of 40s passed.

        Unfortunately, an error has occurred:
                timed out waiting for the condition

        This error is likely caused by:
                - The kubelet is not running
                - The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)

        If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
                - 'systemctl status kubelet'
                - 'journalctl -xeu kubelet'

        Additionally, a control plane component may have crashed or exited when started by the container runtime.
        To troubleshoot, list all containers using your preferred container runtimes CLI.

        Here is one example how you may list all Kubernetes containers running in docker:
                - 'docker ps -a | grep kube | grep -v pause'
                Once you have found the failing container, you can inspect its logs with:
                - 'docker logs CONTAINERID'

error execution phase wait-control-plane: couldn't initialize a Kubernetes cluster
To see the stack trace of this error execute with --v=5 or higher

报错提示我使用journalctl -xeu kubelet查看日志,我们打印日志,结果如下

-- Logs begin at 二 2022-04-19 16:20:01 CST, end at 三 2022-04-20 13:28:56 CST. --
4月 20 13:27:31 k8s-master kubelet[31175]: E0420 13:27:31.386568   31175 kubelet.go:2267] node "k8s-master" not found
4月 20 13:27:31 k8s-master kubelet[31175]: E0420 13:27:31.486711   31175 kubelet.go:2267] node "k8s-master" not found
4月 20 13:27:31 k8s-master kubelet[31175]: E0420 13:27:31.572931   31175 csi_plugin.go:271] Failed to initialize CSINodeInfo: error updating CSINode annotation: timed out waiting for the condition; caused by: [Get https://ip:6443/apis/storage.k8s.io/v1/csinodes/k8s-master: net/http: TLS handshake timeout, Get https://ip:6443/apis/storage.k8s.io/v1/csinodes/k8s-master: dial tcp ip:6443: connect: connection refused]
4月 20 13:27:31 k8s-master kubelet[31175]: E0420 13:27:31.586832   31175 kubelet.go:2267] node "k8s-master" not found
4月 20 13:27:31 k8s-master kubelet[31175]: E0420 13:27:31.686950   31175 kubelet.go:2267] node "k8s-master" not found
4月 20 13:27:31 k8s-master kubelet[31175]: E0420 13:27:31.788088   31175 kubelet.go:2267] node "k8s-master" not found
4月 20 13:27:31 k8s-master kubelet[31175]: I0420 13:27:31.843736   31175 kubelet_node_status.go:294] Setting node annotation to enable volume controller attach/detach
4月 20 13:27:31 k8s-master kubelet[31175]: I0420 13:27:31.845909   31175 kubelet_node_status.go:70] Attempting to register node k8s-master
4月 20 13:27:31 k8s-master kubelet[31175]: E0420 13:27:31.846661   31175 kubelet_node_status.go:92] Unable to register node "k8s-master" with API server: Post https://ip:6443/api/v1/nodes: dial tcp ip:6443: connect: connection refused
4月 20 13:27:31 k8s-master kubelet[31175]: E0420 13:27:31.888212   31175 kubelet.go:2267] node "k8s-master" not found
4月 20 13:27:31 k8s-master kubelet[31175]: E0420 13:27:31.988321   31175 kubelet.go:2267] node "k8s-master" not found
4月 20 13:27:32 k8s-master kubelet[31175]: E0420 13:27:32.005184   31175 csi_plugin.go:271] Failed to initialize CSINodeInfo: error updating CSINode annotation: timed out waiting for the condition; caused by: Get https://ip:6443/apis/storage.k8s.io/v1/csinodes/k8s-master: dial tcp ip:6443: connect: connection refused

这里面有两种报错,我跟据这些报错都查了一遍

node “k8s-master” not found:这是中间错误,查他没用,他不是根源
Failed to initialize CSINodeInfo,dial tcp ip:6443: connect: connection refused:这个错误是初始化失败

现在大概清楚是apiserver没启动,docker ps -a也能看到apiserver退出了(exited),在查询connection refused时,有文章(这篇文章)提到可以查看docker logs,于是我打印了apiserver的日志

W0420 06:27:37.750969       1 clientconn.go:1208] grpc: addrConn.createTransport failed to connect to {https://127.0.0.1:2379  <nil> 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused". Reconnecting...

日志最后就不断在尝试连接2379端口,正如上面文章提到的,2379时etcd的端口,所以是etcd启动失败导致的apiserver失败。

于是我们查看etcd的docker日志

2022-04-20 06:31:13.533316 C | etcdmain: listen tcp ip:2380: bind: cannot assign requested address

最后一行报错,无法分配地址,我以为是安全组的问题,但是并不上,百度之后得到结果:GitHub issue

解决

是公有云的问题,在kubeadm的apiserver-advertise-address参数应该写内网地址,而不是公网地址。
Logo

魔乐社区(Modelers.cn) 是一个中立、公益的人工智能社区,提供人工智能工具、模型、数据的托管、展示与应用协同服务,为人工智能开发及爱好者搭建开放的学习交流平台。社区通过理事会方式运作,由全产业链共同建设、共同运营、共同享有,推动国产AI生态繁荣发展。

更多推荐