Skip to content

集群搭建

kubeadm概述

kubeadm是官方社区推出的一个用于快速部署kubernetes集群的工具。

这个工具能通过两条指令完成一个kubernetes集群的部署

sh
# 创建一个 Master 节点
kubeadm init

# 将一个 Node 节点加入到当前集群中
kubeadm join <Master节点的IP和端口 >

运行环境

节点地址版本操作系统
master192.168.31.1201.28.2Debian12
node1192.168.31.1211.28.2Debian12
node2192.168.31.1221.28.2Debian12

环境搭建

以下操作每台机器都执行

sh
# 关闭swap
# 临时
swapoff -a
# 永久关闭
sed -ri 's/.*swap.*/#&/' /etc/fstab

# 根据规划设置主机名【master节点上操作】
hostnamectl set-hostname master
# 根据规划设置主机名【node1节点操作】
hostnamectl set-hostname node1
# 根据规划设置主机名【node2节点操作】
hostnamectl set-hostname node2

# 在master添加hosts
cat >> /etc/hosts << EOF
192.168.31.120 master
192.168.31.121 node1
192.168.31.122 node2
EOF

# 将桥接的IPv4流量传递到iptables的链(可跳过)
cat <<EOF | tee /etc/sysctl.d/k8s.conf
vm.swappiness = 0
net.bridge.bridge-nf-call-iptables  = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv4.ip_forward                 = 1
EOF
# 生效
sysctl --system

# 时间同步
apt install ntpdate -y
ntpdate time.windows.com

安装Containerd

添加进行源

sh
# 下载证书
curl -fsSL https://mirrors.aliyun.com/docker-ce/linux/debian/gpg | gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg

# 配置容器镜像源
echo "deb [arch=amd64 signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://mirrors.aliyun.com/docker-ce/linux/debian $(lsb_release -cs) stable" | tee /etc/apt/sources.list.d/docker.list > /dev/null

# 更新源
apt-get upgrade && apt-get update

安装

sh
apt install containerd

Containerd 配置

sh
# 初始化配置
containerd config default > /etc/containerd/config.toml

# 修改配置
vim /etc/containerd/config.toml
# sandbox_image = "registry.k8s.io/pause:3.6" 更新配置如下
sandbox_image = "registry.cn-hangzhou.aliyuncs.com/google_containers/pause:3.6"

# SystemdCgroup = false 更新配置如下
SystemdCgroup = true

# 重启服务
systemctl restart containerd.service

# 查看服务状态
systemctl status containerd.service

# 查看组件版本信息
containerd -version
ctr version

Containerd使用

sh
# 拉取镜像
# 镜像名不能简写,--all-platforms:所有平台,不加的话下载当前平台架构
ctr images pull --platform linux/amd64 docker.io/library/nginx:1.22

# 查看镜像
ctr images ls
# resp
REF                          TYPE                                                      DIGEST                                                                  SIZE     PLATFORMS                                                                                               LABELS 
docker.io/library/nginx:1.22 application/vnd.docker.distribution.manifest.list.v2+json sha256:fc5f5fb7574755c306aaf88456ebfbe0b006420a184d52b923d2f0197108f6b7 54.4 MiB linux/386,linux/amd64,linux/arm/v5,linux/arm/v7,linux/arm64/v8,linux/mips64le,linux/ppc64le,linux/s390x - 

# 导出镜像
ctr images export --platform linux/amd64 nginx.tar docker.io/library/nginx:1.22

# 导入镜像
ctr i import nginx.tar --platform linux/amd64

# 删除镜像
ctr image rm docker.io/library/nginx:1.22

# k8s导出镜像
ctr --namespace k8s.io images export --platform linux/amd64 nginx.tar docker.io/library/nginx:1.22

# k8s导入镜像
ctr -n k8s.io i import nginx.tar --platform linux/amd64

docker 镜像和 containerd 镜像是一样的,可以相互导入导出

--platform:导入导出操作必须要加上平台

--namespace :命名空间,k8s.io是kubernetes的命名空间

安装kubeadm,kubelet和kubectl

添加kubernetes软件源

阿里云

sh
curl -fsSL https://mirrors.aliyun.com/kubernetes/apt/doc/apt-key.gpg | gpg --dearmor -o /usr/share/keyrings/kubernetes-archive-keyring.gpg

echo "deb [signed-by=/usr/share/keyrings/kubernetes-archive-keyring.gpg] https://mirrors.aliyun.com/kubernetes/apt/ kubernetes-xenial main" | tee /etc/apt/sources.list.d/kubernetes.list

中科大

sh
curl -fsSL https://mirrors.aliyun.com/kubernetes/apt/doc/apt-key.gpg | gpg --dearmor -o /usr/share/keyrings/kubernetes-archive-keyring.gpg

echo "deb [signed-by=/usr/share/keyrings/kubernetes-archive-keyring.gpg] http://mirrors.ustc.edu.cn/kubernetes/apt kubernetes-xenial main" | tee /etc/apt/sources.list.d/kubernetes.list

安装指定版本

sh
# 更新源
apt-get update

# 安装最新版本
apt install -y kubelet kubeadm kubectl

# 安装kubelet、kubeadm、kubectl,同时指定版本
apt install kubelet=1.28.2-00 kubeadm=1.28.2-00 kubectl=1.28.2-00

# 设置开机启动
systemctl enable kubelet

部署Master节点

在master节点执行初始化操作

sh
kubeadm init \
--apiserver-advertise-address=192.168.31.120 \
--image-repository registry.aliyuncs.com/google_containers \
--kubernetes-version v1.28.2 \
--service-cidr=10.96.0.0/12 \
--pod-network-cidr=10.244.0.0/16

由于默认拉取镜像地址k8s.gcr.io国内无法访问,这里指定阿里云镜像仓库地址。

相关命令

sh
# 重置节点
kubeadm reset

# 删除指定节点
kubectl delete node node1

查看服务日志信息

sh
journalctl -xeu kubelet
journalctl -xe

出现下面的情况时,表示kubernetes的镜像已经安装成功

sh
Your Kubernetes control-plane has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config

Alternatively, if you are the root user, you can run:

  export KUBECONFIG=/etc/kubernetes/admin.conf

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  https://kubernetes.io/docs/concepts/cluster-administration/addons/

Then you can join any number of worker nodes by running the following on each as root:

kubeadm join 192.168.31.120:6443 --token 0z1rjy.im58d7196ffqccpk \
        --discovery-token-ca-cert-hash sha256:9c72f989fffce252a65afd042c3480878b7b603d34eae2eede2cdcbc95b78486

使用kubectl工具

sh
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

执行完成后,查看我们正在运行的节点

sh
[root@master ~]# kubectl get nodes
NAME     STATUS     ROLES    AGE    VERSION
master   NotReady   master   2m1s   v1.18.0

加入Kubernetes Node

下面我们需要到 node1node2 服务器,执行下面的代码向集群添加新节点

执行在kubeadm init 输出的 kubeadm join 命令

sh
kubeadm join 192.168.31.120:6443 --token 0z1rjy.im58d7196ffqccpk \
        --discovery-token-ca-cert-hash sha256:9c72f989fffce252a65afd042c3480878b7b603d34eae2eede2cdcbc95b78486

默认token有效期为24小时,过期之后,该token就不可用了

Token相关命令

sh
# 查询token
kubeadm token list

# 重新创建token
kubeadm token create --print-join-command

当我们把两个节点都加入进来后,我们就可以去Master节点查看节点状态。

sh
[root@master ~]# kubectl get node
NAME     STATUS   ROLES           AGE     VERSION
master   NotReady    control-plane   4m54s   v1.28.2
node1    NotReady    <none>          4m15s   v1.28.2
node2    NotReady    <none>          3m55s   v1.28.2

安装CNI网络插件

上面的状态还是 NotReady,下面我们需要网络插件,来进行联网访问

sh
# 下载网络插件配置
curl -fsSL -O https://github.com/flannel-io/flannel/releases/latest/download/kube-flannel.yml
# 备用链接
curl -fsSL -O https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml

# 部署网络插件
kubectl apply -f kube-flannel.yml

# 删除配置,删除后可以重新部署
kubectl delete -f kube-flannel.yml

# 一键配置
kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml

测试运行

在Kubernetes集群中创建一个pod,验证是否正常运行

sh
# 创建Nginx容器
root@master:~# kubectl create deployment nginx --image=nginx
deployment.apps/nginx created

# 查看状态
root@master:~# kubectl get pods
NAME                     READY   STATUS              RESTARTS   AGE
nginx-7854ff8877-6tpft   0/1     ContainerCreating   0          8s

# 暴露端口
root@master:~# kubectl expose deployment nginx --port=80 --type=NodePort
service/nginx exposed

# 查看一下对外的端口
root@master:~# kubectl get pod,svc
NAME                         READY   STATUS    RESTARTS   AGE
pod/nginx-7854ff8877-jfnn7   1/1     Running   0          119s

NAME                 TYPE        CLUSTER-IP    EXTERNAL-IP   PORT(S)        AGE
service/kubernetes   ClusterIP   10.96.0.1     <none>        443/TCP        6m7s
service/nginx        NodePort    10.111.38.8   <none>        80:31803/TCP   4s

# 外部访问
http://192.168.31.120:31803

问题汇总

初始化超时

报错信息

sh
# 报错信息
kubelet-check Initial timeout of 40s passed.
# 报错详情
failed to pull image \"registry.k8s.io/pause:3.6\": failed to pull and unpack image \"registry.k8s.io/pause:3.6\": f>
y.k8s.io/pause:3.6\": failed to pull image \"registry.k8s.io/pause:3.6\": failed to pull and unpack image \"registry.k8s.io/pause:3.6\": failed>
ry.k8s.io/pause:3.6\": failed to pull image \"registry.k8s.io/pause:3.6\": failed to pull and unpack image \"registry.k8s.io/pause:3.6\": faile>
8d5)\" with CreatePodSandboxError: \"Failed to create sandbox for pod \\\"etcd-vm-0-3_kube-system(f3061661aec984b83229f5c3781ef8d5)\\\": rpc er>
/pause/manifests/3.6\": dial tcp 173.194.174.82:443: i/o timeout" host=registry.k8s.io

解决方法

sh
# 初始化配置
containerd config default > /etc/containerd/config.toml

# 修改配置
vim /etc/containerd/config.toml
# sandbox_image = "registry.k8s.io/pause:3.6" 更新配置如下
sandbox_image = "registry.cn-hangzhou.aliyuncs.com/google_containers/pause:3.6"

# SystemdCgroup = false 更新配置如下
SystemdCgroup = true

# 重启服务
systemctl restart containerd.service

# 查看服务状态
systemctl status containerd.service

# 查看组件版本信息
containerd -version
ctr version

没有找到主机

报错信息

sh
[WARNING Hostname]: hostname "node1" could not be reached
[WARNING Hostname]: hostname "node1": lookup node1 on 192.168.31.1:53: no such host

解决方案

sh
cat >> /etc/hosts << EOF
192.168.31.120 master
192.168.31.121 node1
192.168.31.122 node2
EOF

kubectl连接被拒绝

报错信息

sh
E1214 09:43:11.283902   19067 memcache.go:265] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp [::1]:8080: connect: connection refused
E1214 09:43:11.284740   19067 memcache.go:265] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp [::1]:8080: connect: connection refused
E1214 09:43:11.286597   19067 memcache.go:265] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp [::1]:8080: connect: connection refused
E1214 09:43:11.288326   19067 memcache.go:265] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp [::1]:8080: connect: connection refused
E1214 09:43:11.291358   19067 memcache.go:265] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp [::1]:8080: connect: connection refused
The connection to the server localhost:8080 was refused - did you specify the right host or port?

解决方案

sh
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

crictl报错

报错信息

sh
WARN[0000] image connect using default endpoints: [unix:///var/run/dockershim.sock unix:///run/containerd/containerd.sock unix:///run/crio/crio.sock unix:///var/run/cri-dockerd.sock]. As the default settings are now deprecated, you should set the endpoint instead. 
E1221 10:25:56.883414    4315 remote_image.go:119] "ListImages with filter from image service failed" err="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial unix /var/run/dockershim.sock: connect: no such file or directory\"" filter="&ImageFilter{Image:&ImageSpec{Image:,Annotations:map[string]string{},},}"
FATA[0000] listing images: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial unix /var/run/dockershim.sock: connect: no such file or directory"

解决方案

sh
cat > /etc/crictl.yaml <<EOF
runtime-endpoint: unix:///var/run/containerd/containerd.sock
image-endpoint: unix:///var/run/containerd/containerd.sock
timeout: 0
debug: false
pull-image-on-create: false
EOF

容器没有运行

报错信息

sh
# 报错信息
error execution phase preflight: [preflight] Some fatal errors occurred:
        [ERROR CRI]: container runtime is not running: output: time="2023-12-13T15:59:50+08:00" level=fatal msg="validate service connection: CRI v1 runtime API is not implemented for endpoint \"unix:///var/run/containerd/containerd.sock\": rpc error: code = Unimplemented desc = unknown service runtime.v1.RuntimeService"

解决方案

sh
# 查看 container 服务
systemctl status containerd

# 修改配置
vim /etc/containerd/config.toml
# 修改以下配置
#disabled_plugins = ["cri"]

# 重启 container 服务
systemctl restart containerd