kubernetes集群初始化kubeadm启动失败

前提建议先看一下错误是否一致，再看解决方案。由于我是初学者，在大量的百度之后也学会了一些排查方式。点我直接看结果环境腾讯云centos7启动命令kubeadm init–apiserver-advertise-address=ip–kubernetes-version v1.18.0–service-cidr=10.96.0.0/12–pod-network-cidr=10.244.0.0/16排

CPeony

14964人浏览 · 2022-04-20 15:01:38

CPeony · 2022-04-20 15:01:38 发布

前提

建议先看一下错误是否一致，再看解决方案。由于我是初学者，在大量的百度之后也学会了一些排查方式。
点我直接看结果

环境
腾讯云centos7

启动命令
kubeadm init
–apiserver-advertise-address=ip
–kubernetes-version v1.18.0
–service-cidr=10.96.0.0/12
–pod-network-cidr=10.244.0.0/16

排查过程

首先这是失败的报错

[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[kubelet-check] Initial timeout of 40s passed.

        Unfortunately, an error has occurred:
                timed out waiting for the condition

        This error is likely caused by:
                - The kubelet is not running
                - The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)

        If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
                - 'systemctl status kubelet'
                - 'journalctl -xeu kubelet'

        Additionally, a control plane component may have crashed or exited when started by the container runtime.
        To troubleshoot, list all containers using your preferred container runtimes CLI.

        Here is one example how you may list all Kubernetes containers running in docker:
                - 'docker ps -a | grep kube | grep -v pause'
                Once you have found the failing container, you can inspect its logs with:
                - 'docker logs CONTAINERID'

error execution phase wait-control-plane: couldn't initialize a Kubernetes cluster
To see the stack trace of this error execute with --v=5 or higher

报错提示我使用journalctl -xeu kubelet查看日志，我们打印日志，结果如下

-- Logs begin at 二 2022-04-19 16:20:01 CST, end at 三 2022-04-20 13:28:56 CST. --
4月 20 13:27:31 k8s-master kubelet[31175]: E0420 13:27:31.386568   31175 kubelet.go:2267] node "k8s-master" not found
4月 20 13:27:31 k8s-master kubelet[31175]: E0420 13:27:31.486711   31175 kubelet.go:2267] node "k8s-master" not found
4月 20 13:27:31 k8s-master kubelet[31175]: E0420 13:27:31.572931   31175 csi_plugin.go:271] Failed to initialize CSINodeInfo: error updating CSINode annotation: timed out waiting for the condition; caused by: [Get https://ip:6443/apis/storage.k8s.io/v1/csinodes/k8s-master: net/http: TLS handshake timeout, Get https://ip:6443/apis/storage.k8s.io/v1/csinodes/k8s-master: dial tcp ip:6443: connect: connection refused]
4月 20 13:27:31 k8s-master kubelet[31175]: E0420 13:27:31.586832   31175 kubelet.go:2267] node "k8s-master" not found
4月 20 13:27:31 k8s-master kubelet[31175]: E0420 13:27:31.686950   31175 kubelet.go:2267] node "k8s-master" not found
4月 20 13:27:31 k8s-master kubelet[31175]: E0420 13:27:31.788088   31175 kubelet.go:2267] node "k8s-master" not found
4月 20 13:27:31 k8s-master kubelet[31175]: I0420 13:27:31.843736   31175 kubelet_node_status.go:294] Setting node annotation to enable volume controller attach/detach
4月 20 13:27:31 k8s-master kubelet[31175]: I0420 13:27:31.845909   31175 kubelet_node_status.go:70] Attempting to register node k8s-master
4月 20 13:27:31 k8s-master kubelet[31175]: E0420 13:27:31.846661   31175 kubelet_node_status.go:92] Unable to register node "k8s-master" with API server: Post https://ip:6443/api/v1/nodes: dial tcp ip:6443: connect: connection refused
4月 20 13:27:31 k8s-master kubelet[31175]: E0420 13:27:31.888212   31175 kubelet.go:2267] node "k8s-master" not found
4月 20 13:27:31 k8s-master kubelet[31175]: E0420 13:27:31.988321   31175 kubelet.go:2267] node "k8s-master" not found
4月 20 13:27:32 k8s-master kubelet[31175]: E0420 13:27:32.005184   31175 csi_plugin.go:271] Failed to initialize CSINodeInfo: error updating CSINode annotation: timed out waiting for the condition; caused by: Get https://ip:6443/apis/storage.k8s.io/v1/csinodes/k8s-master: dial tcp ip:6443: connect: connection refused

这里面有两种报错，我跟据这些报错都查了一遍

node “k8s-master” not found：这是中间错误，查他没用，他不是根源
Failed to initialize CSINodeInfo,dial tcp ip:6443: connect: connection refused：这个错误是初始化失败

现在大概清楚是apiserver没启动，docker ps -a也能看到apiserver退出了(exited)，在查询connection refused时，有文章(这篇文章)提到可以查看docker logs，于是我打印了apiserver的日志

W0420 06:27:37.750969       1 clientconn.go:1208] grpc: addrConn.createTransport failed to connect to {https://127.0.0.1:2379  <nil> 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused". Reconnecting...

日志最后就不断在尝试连接2379端口，正如上面文章提到的，2379时etcd的端口，所以是etcd启动失败导致的apiserver失败。

于是我们查看etcd的docker日志

2022-04-20 06:31:13.533316 C | etcdmain: listen tcp ip:2380: bind: cannot assign requested address

最后一行报错，无法分配地址，我以为是安全组的问题，但是并不上，百度之后得到结果：GitHub issue

解决

是公有云的问题，在kubeadm的apiserver-advertise-address参数应该写内网地址，而不是公网地址。

魔乐社区

魔乐社区（Modelers.cn) 是一个中立、公益的人工智能社区，提供人工智能工具、模型、数据的托管、展示与应用协同服务，为人工智能开发及爱好者搭建开放的学习交流平台。社区通过理事会方式运作，由全产业链共同建设、共同运营、共同享有，推动国产AI生态繁荣发展。

更多推荐

OpenLayers大数据可视化终极指南：10个性能优化技巧提升交互体验

OpenLayers作为一款功能强大的开源地图库，在处理大规模地理数据可视化时，性能优化是提升用户体验的关键。本文将分享10个实用的性能优化技巧，帮助开发者在使用OpenLayers构建大数据地图应用时，实现流畅的交互体验和高效的数据渲染。## 1. 合理使用WebGL渲染器提升大数据渲染速度 🚀WebGL渲染器是处理大规模地理数据的理想选择，它利用GPU加速图形渲染，能够显著提升数据量

魔乐社区

终极指南：如何用ffmpeg-python实现惊艳的视频合成效果

ffmpeg-python是一个强大的Python绑定库，为FFmpeg提供了简洁易用的接口，特别擅长处理复杂的视频过滤和合成任务。无论是简单的视频翻转还是复杂的多轨道合成，ffmpeg-python都能让你轻松实现专业级的视频效果。## 为什么选择ffmpeg-python进行视频合成？FFmpeg是视频处理领域的瑞士军刀，但命令行接口复杂且难以维护。ffmpeg-python将这种复杂