11-K8s手动部署-单master¶
kubeadm 和二进制安装 k8s 适用场景分析 kubeadm 是官方提供的开源工具,是一个开源项目,用于快速搭建 kubernetes 集群,目前是比较 方便和推荐使用的。kubeadm init 以及 kubeadm join 这两个命令可以快速创建 kubernetes 集群。 Kubeadm 初始化 k8s,所有的组件都是以 pod 形式运行的,具备故障自恢复能力。 kubeadm 是工具,可以快速搭建集群,也就是相当于用程序脚本帮我们装好了集群,属于自动部署,简化部署操作,证书、组件资源清单文件都是自动创建的 kubeadm 自动部署屏蔽了很多细节,使得对各个模块感知很少,如果对 k8s 架构组件理解不深的话,遇到问题比较难排查。 kubeadm 适合需要经常部署 k8s,或者对自动化要求比较高的场景下使用。
二进制:在官网下载相关组件的二进制包,如果手动安装,对 kubernetes 理解也会更全面。 Kubeadm 和二进制都适合生产环境,在生产环境运行都很稳定,具体如何选择,可以根据实际项目进行评估。
环境准备¶
配置机器主机名(master1 node1 node2节点执行)
master1节点执行:
hostnamectl set-hostname master1 && bash
node1节点执行:
hostnamectl set-hostname node1 && bash
node2节点执行:
hostnamectl set-hostname node2 && bash
设置/etc/hosts保证主机名能够解析(master1 node1 node2节点执行)
cat >>/etc/hosts<<EOF
192.168.1.26 master1
192.168.1.27 node1
192.168.1.28 node2
EOF
设置部署节点到其它所有节点的SSH免密码登录(master1节点执行)
yum -y install sshpass
cat >/root/.ssh/config<<EOF
Host *
Port 22
User root
StrictHostKeyChecking no
UserKnownHostsFile=/dev/nul
EOF
cd /root/
ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
sshpass -pP@sswd ssh-copy-id master1
sshpass -pP@sswd ssh-copy-id node1
sshpass -pP@sswd ssh-copy-id node2
#检验测试master1节点是否可以免密所有机器
ssh master1 "hostname -I"
ssh node1 "hostname -I"
ssh node2 "hostname -I"
关闭交换分区 swap,提升性能
# 所有节点都要执行
swapoff -a
# Swap 是交换分区,如果机器内存不够,会使用 swap 分区,但是 swap 分区的性能较低,k8s 设计的
时候为了能提升性能,默认是不允许使用姜欢分区的。Kubeadm 初始化的时候会检测 swap 是否关闭,如果没关闭,那就初始化失败。如果不想要关闭交换分区,安装 k8s 的时候可以指定--ignorepreflight-errors=Swap 来解决。
修改机器内核参数
# 所有节点都要执行
modprobe br_netfilter
echo "modprobe br_netfilter" >> /etc/profile
cat >/etc/sysctl.d/k8s.conf <<EOF
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_forward = 1
EOF
sysctl -p /etc/sysctl.d/k8s.conf
配置阿里云yum仓库
rm -f /etc/yum.repos.d/*.repo
curl -o /etc/yum.repos.d/CentOS-Base.repo http://mirrors.aliyun.com/repo/Centos-7.repo
curl -o /etc/yum.repos.d/epel.repo http://mirrors.aliyun.com/repo/epel-7.repo
sed -i "/mirrors.aliyuncs.com/d" /etc/yum.repos.d/CentOS-Base.repo
sed -i "/mirrors.cloud.aliyuncs.com/d" /etc/yum.repos.d/CentOS-Base.repo
yum clean all
配置docker组件需要的阿里云的 repo 源
cat >/etc/yum.repos.d/docker-ce.repo<<\EOF
[docker-ce-stable]
name=Docker CE Stable - $basearch
baseurl=https://mirrors.aliyun.com/docker-ce/linux/centos/$releasever/$basearch/stable
enabled=1
gpgcheck=0
gpgkey=https://mirrors.aliyun.com/docker-ce/linux/centos/gpg
EOF
yum repolist
配置安装 k8s 组件需要的阿里云的 repo 源
cat >/etc/yum.repos.d/kubernetes.repo<<\EOF
[kubernetes]
name=Kubernetes
baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64/
enabled=1
gpgcheck=0
EOF
yum repolist
开启 ipvs功能
# 所有节点都要执行
cat >/etc/sysconfig/modules/ipvs.modules<<\EOF
#!/bin/bash
ipvs_modules="ip_vs ip_vs_lc ip_vs_wlc ip_vs_rr ip_vs_wrr ip_vs_lblc ip_vs_lblcr ip_vs_dh ip_vs_sh ip_vs_nq ip_vs_sed ip_vs_ftp nf_conntrack"
for kernel_module in ${ipvs_modules}; do
/sbin/modinfo -F filename ${kernel_module} > /dev/null 2>&1
if [ 0 -eq 0 ]; then
/sbin/modprobe ${kernel_module}
fi
done
EOF
chmod 755 /etc/sysconfig/modules/ipvs.modules && bash /etc/sysconfig/modules/ipvs.modules && lsmod | grep ip_vs
ipvs (IP Virtual Server) 实现了传输层负载均衡,也就是我们常说的 4 层 LAN 交换,作为 Linux 内核的一部分。ipvs 运行在主机上,在真实服务器集群前充当负载均衡器。ipvs 可以将基于 TCP 和 UDP的服务请求转发到真实服务器上,并使真实服务器的服务在单个 IP 地址上显示为虚拟服务。
安装基本工具包
# 所有节点都要执行
yum -y install ipvsadm conntrack ntpdate telnet vim
安装docker¶
安装 docker-ce
# 所有节点都要执行
yum install docker-ce-20.10.6 docker-ce-cli-20.10.6 containerd.io -y
启动 docker-ce
# 所有节点都要执行
systemctl start docker && systemctl enable docker.service ; systemctl status docker.service
配置 docker 镜像加速器和驱动
# 所有节点都要执行
cat >/etc/docker/daemon.json<<\EOF
{
"registry-mirrors":["https://rsbud4vc.mirror.aliyuncs.com","https://registry.docker-cn.com","https://docker.mirrors.ustc.edu.cn","https://dockerhub.azk8s.cn","http://hub-mirror.c.163.com","http://qtid6917.mirror.aliyuncs.com", "https://rncxm540.mirror.aliyuncs.com"],
"exec-opts": ["native.cgroupdriver=systemd"]
}
EOF
systemctl daemon-reload
systemctl restart docker
systemctl status docker
#修改 docker 文件驱动为 systemd,默认为 cgroupfs,kubelet 默认使用 systemd,两者必须一致才可以。
安装master¶
安装初始化 k8s 需要的软件包
# 所有节点都要执行
yum install -y kubelet-1.20.6 kubeadm-1.20.6 kubectl-1.20.6
# 注:每个软件包的作用
Kubeadm: kubeadm 是一个工具,用来初始化 k8s 集群的
kubelet: 安装在集群所有节点上,用于启动 Pod 的
kubectl: 通过 kubectl 可以部署和管理应用,查看各种资源,创建、删除和更新各种组件
启动kubelet服务
# 所有节点都要执行
systemctl enable kubelet ; systemctl start kubelet
sleep 5
systemctl status kubelet
#上面可以看到 kubelet 状态不是 running 状态,这个是正常的,不用管,等 k8s 组件起来这个kubelet 就正常了。
kubeadm 初始化 k8s 集群,使用 kubeadm 初始化 k8s 集群
kubeadm init --kubernetes-version=1.20.6 --apiserver-advertise-address=192.168.1.26 --image-repository registry.aliyuncs.com/google_containers --pod-network-cidr=10.244.0.0/16 --ignore-preflight-errors=SystemVerification
注:--image-repository registry.aliyuncs.com/google_containers:手动指定仓库地址为
registry.aliyuncs.com/google_containers。kubeadm 默认从 k8s.grc.io 拉取镜像,但是 k8s.gcr.io访问不到,所以需要指定从 registry.aliyuncs.com/google_containers 仓库拉取镜像。
下面是初始化master节点成功后的信息
Your Kubernetes control-plane has initialized successfully!
To start using your cluster, you need to run the following as a regular user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
Alternatively, if you are the root user, you can run:
export KUBECONFIG=/etc/kubernetes/admin.conf
You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/
Then you can join any number of worker nodes by running the following on each as root:
kubeadm join 192.168.1.26:6443 --token 95k0et.9c4uitzqasdx5axa \
--discovery-token-ca-cert-hash sha256:9495462d474420d5e4ee3b39bb8a258997f7dfb9d76926baa4aaeaba167b436d
配置 kubectl 的配置文件 config,相当于对 kubectl 进行授权,这样 kubectl 命令可以使用这个证书对 k8s 集群进行管理
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
查看集群信息
[root@master1 ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
master1 NotReady control-plane,master 82s v1.20.6
此时集群状态还是 NotReady 状态,因为没有安装网络插件。
安装node¶
安装 k8s 集群-添加第一个工作节点
[root@master1 ~]# kubeadm token create --print-join-command
kubeadm join 192.168.1.26:6443 --token nhw968.t906x9wfvgjxbgp1 --discovery-token-ca-cert-hash sha256:9495462d474420d5e4ee3b39bb8a258997f7dfb9d76926baa4aaeaba167b436d
node1节点执行如上命令
kubeadm join 192.168.1.26:6443 --token nhw968.t906x9wfvgjxbgp1 --discovery-token-ca-cert-hash sha256:9495462d474420d5e4ee3b39bb8a258997f7dfb9d76926baa4aaeaba167b436d
看到下面说明 node1 节点已经加入到集群了,充当工作节点
This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.
Run 'kubectl get nodes' on the control-plane to see this node join the cluster.
在 master1 上查看集群节点状况:
[root@master1 ~]# kubectl get node
NAME STATUS ROLES AGE VERSION
master1 NotReady control-plane,master 3m20s v1.20.6
node1 NotReady <none> 46s v1.20.6
安装 k8s 集群-添加第二个工作节点
[root@master1 ~]# kubeadm token create --print-join-command
kubeadm join 192.168.1.26:6443 --token vrx60x.1sq6s9g752fe1ufr --discovery-token-ca-cert-hash sha256:9495462d474420d5e4ee3b39bb8a258997f7dfb9d76926baa4aaeaba167b436d
在node2节点执行如上命令
看到下面说明 node1 节点已经加入到集群了,充当工作节点
This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.
Run 'kubectl get nodes' on the control-plane to see this node join the cluster.
在 master1 上查看集群节点状况:
[root@master1 ~]# kubectl get node
NAME STATUS ROLES AGE VERSION
master1 Ready control-plane,master 17m v1.20.6
node1 Ready <none> 14m v1.20.6
node2 Ready <none> 79s v1.20.6
安装calico¶
准备安装 calico 网络插件 yaml文件
[root@master1 ~]# yum -y install lrzsz
[root@master1 ~]# rz calico.yaml
注:在线下载配置文件地址是: https://docs.projectcalico.org/manifests/calico.yaml
使用 yaml 文件安装 calico 网络插件
kubectl apply -f calico.yaml
检查服务启动状态(要等到所有容器都启动后就代表部署完毕)
[root@master1 ~]# kubectl get pod -n kube-system|grep calico
calico-kube-controllers-6949477b58-4f66x 1/1 Running 0 98s
calico-node-gxh7k 1/1 Running 0 98s
calico-node-qlwsj 1/1 Running 0 98s
然后检查集群状态为Ready 则表示部署成功
[root@master1 ~]# kubectl get node
NAME STATUS ROLES AGE VERSION
master1 Ready control-plane,master 13m v1.20.6
node1 Ready <none> 11m v1.20.6
验证集群¶
验证集群网络
创建一个测试用的deployment
kubectl run net-test --image=alpine --replicas=2 sleep 360000
查看获取IP情况
[root@master1 ~]# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
net-test 1/1 Running 0 40s 10.244.104.1 node2 <none> <none>
测试集群pod网络是否可以联通
[root@master1 ~]# ping -c 4 10.244.104.1
PING 10.244.104.1 (10.244.104.1) 56(84) bytes of data.
64 bytes from 10.244.104.1: icmp_seq=1 ttl=63 time=0.384 ms
64 bytes from 10.244.104.1: icmp_seq=2 ttl=63 time=0.353 ms
64 bytes from 10.244.104.1: icmp_seq=3 ttl=63 time=0.415 ms
64 bytes from 10.244.104.1: icmp_seq=4 ttl=63 time=0.314 ms
验证集群服务
启动一个nginx服务
yum -y install git
git clone https://gitee.com/chriscentos/salt-kubebin.git
cd /root/salt-kubebin/example/
kubectl apply -f nginx-pod.yaml
查看pod启动情况
[root@master1 example]# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
net-test 1/1 Running 0 3m44s 10.244.104.1 node2 <none> <none>
nginx-pod 1/1 Running 0 28s 10.244.104.2 node2 <none> <none>
验证nginx服务是否可用
[root@master1 ~]# curl 10.244.104.2
验证 coredns
coredns默认是安装的
[root@master1 ~]# kubectl get pod -ALL|grep dns
kube-system coredns-7f89b7bc75-bxwt5 1/1 Running 0 30m
kube-system coredns-7f89b7bc75-wl9pc 1/1 Running 0 30m
我们只需进入容器验证即可
[root@master1 ~]# kubectl exec -it net-test sh
/ # ping -c 4 www.baidu.com
PING www.baidu.com (180.101.49.12): 56 data bytes
64 bytes from 180.101.49.12: seq=0 ttl=48 time=24.178 ms
64 bytes from 180.101.49.12: seq=1 ttl=48 time=24.377 ms
64 bytes from 180.101.49.12: seq=2 ttl=48 time=23.878 ms
64 bytes from 180.101.49.12: seq=3 ttl=48 time=24.395 ms
扩容node¶
配置机器主机名(node3节点执行)
node3节点执行:
hostnamectl set-hostname node3 && bash
设置/etc/hosts保证主机名能够解析(node3节点执行)
cat >>/etc/hosts<<EOF
192.168.1.26 master1
192.168.1.27 node1
192.168.1.28 node2
192.168.1.29 node3
EOF
设置部署节点到其它所有节点的SSH免密码登录(master1节点执行)
cat >>/etc/hosts<<EOF
192.168.1.29 node3
EOF
sshpass -pP@sswd ssh-copy-id node3
#检验测试master1节点是否可以免密所有机器
ssh node3 "hostname -I"
关闭交换分区 swap,提升性能
# 所有节点都要执行
swapoff -a
# Swap 是交换分区,如果机器内存不够,会使用 swap 分区,但是 swap 分区的性能较低,k8s 设计的
时候为了能提升性能,默认是不允许使用姜欢分区的。Kubeadm 初始化的时候会检测 swap 是否关闭,如果没关闭,那就初始化失败。如果不想要关闭交换分区,安装 k8s 的时候可以指定--ignorepreflight-errors=Swap 来解决。
修改机器内核参数
# 所有节点都要执行
modprobe br_netfilter
echo "modprobe br_netfilter" >> /etc/profile
cat >/etc/sysctl.d/k8s.conf <<EOF
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_forward = 1
EOF
sysctl -p /etc/sysctl.d/k8s.conf
配置阿里云yum仓库
rm -f /etc/yum.repos.d/*.repo
curl -o /etc/yum.repos.d/CentOS-Base.repo http://mirrors.aliyun.com/repo/Centos-7.repo
curl -o /etc/yum.repos.d/epel.repo http://mirrors.aliyun.com/repo/epel-7.repo
sed -i "/mirrors.aliyuncs.com/d" /etc/yum.repos.d/CentOS-Base.repo
sed -i "/mirrors.cloud.aliyuncs.com/d" /etc/yum.repos.d/CentOS-Base.repo
yum clean all
配置docker组件需要的阿里云的 repo 源
cat >/etc/yum.repos.d/docker-ce.repo<<\EOF
[docker-ce-stable]
name=Docker CE Stable - $basearch
baseurl=https://mirrors.aliyun.com/docker-ce/linux/centos/$releasever/$basearch/stable
enabled=1
gpgcheck=0
gpgkey=https://mirrors.aliyun.com/docker-ce/linux/centos/gpg
EOF
yum repolist
配置安装 k8s 组件需要的阿里云的 repo 源
cat >/etc/yum.repos.d/kubernetes.repo<<\EOF
[kubernetes]
name=Kubernetes
baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64/
enabled=1
gpgcheck=0
EOF
yum repolist
开启 ipvs功能
# 所有节点都要执行
cat >/etc/sysconfig/modules/ipvs.modules<<\EOF
#!/bin/bash
ipvs_modules="ip_vs ip_vs_lc ip_vs_wlc ip_vs_rr ip_vs_wrr ip_vs_lblc ip_vs_lblcr ip_vs_dh ip_vs_sh ip_vs_nq ip_vs_sed ip_vs_ftp nf_conntrack"
for kernel_module in ${ipvs_modules}; do
/sbin/modinfo -F filename ${kernel_module} > /dev/null 2>&1
if [ 0 -eq 0 ]; then
/sbin/modprobe ${kernel_module}
fi
done
EOF
chmod 755 /etc/sysconfig/modules/ipvs.modules && bash /etc/sysconfig/modules/ipvs.modules && lsmod | grep ip_vs
ipvs (IP Virtual Server) 实现了传输层负载均衡,也就是我们常说的 4 层 LAN 交换,作为 Linux 内核的一部分。ipvs 运行在主机上,在真实服务器集群前充当负载均衡器。ipvs 可以将基于 TCP 和 UDP的服务请求转发到真实服务器上,并使真实服务器的服务在单个 IP 地址上显示为虚拟服务。
安装基本工具包
yum -y install ipvsadm conntrack ntpdate telnet vim
安装 docker-ce
yum install docker-ce-20.10.6 docker-ce-cli-20.10.6 containerd.io -y
启动 docker-ce
systemctl start docker && systemctl enable docker.service ; systemctl status docker.service
配置 docker 镜像加速器和驱动
cat >/etc/docker/daemon.json<<\EOF
{
"registry-mirrors":["https://rsbud4vc.mirror.aliyuncs.com","https://registry.docker-cn.com","https://docker.mirrors.ustc.edu.cn","https://dockerhub.azk8s.cn","http://hub-mirror.c.163.com","http://qtid6917.mirror.aliyuncs.com", "https://rncxm540.mirror.aliyuncs.com"],
"exec-opts": ["native.cgroupdriver=systemd"]
}
EOF
#修改 docker 文件驱动为 systemd,默认为 cgroupfs,kubelet 默认使用 systemd,两者必须一致才可以。
重启加载 docker-ce配置
systemctl daemon-reload
systemctl restart docker
systemctl status docker
安装初始化 k8s 需要的软件包
yum install -y kubelet-1.20.6 kubeadm-1.20.6 kubectl-1.20.6
# 注:每个软件包的作用
Kubeadm: kubeadm 是一个工具,用来初始化 k8s 集群的
kubelet: 安装在集群所有节点上,用于启动 Pod 的
kubectl: 通过 kubectl 可以部署和管理应用,查看各种资源,创建、删除和更新各种组件
启动kubelet服务
systemctl enable kubelet ; systemctl start kubelet
sleep 5
systemctl status kubelet
#上面可以看到 kubelet 状态不是 running 状态,这个是正常的,不用管,等 k8s 组件起来这个kubelet 就正常了。
安装 k8s 集群-添加第三个工作节点
[root@master1 ~]# kubeadm token create --print-join-command
kubeadm join 192.168.1.26:6443 --token vrx60x.1sq6s9g752fe1ufr --discovery-token-ca-cert-hash sha256:9495462d474420d5e4ee3b39bb8a258997f7dfb9d76926baa4aaeaba167b436d
在node3节点执行如上命令
看到下面说明 node1 节点已经加入到集群了,充当工作节点
This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.
Run 'kubectl get nodes' on the control-plane to see this node join the cluster.
在 master1 上查看集群节点状况:
[root@master1 ~]# kubectl get node
NAME STATUS ROLES AGE VERSION
master1 Ready control-plane,master 4h29m v1.20.6
node1 Ready <none> 4h26m v1.20.6
node2 Ready <none> 4h13m v1.20.6
node3 Ready <none> 23s v1.20.6
缩容node¶
先查看一下这个node节点上的pod信息
[root@master1 ~]# kubectl get node
NAME STATUS ROLES AGE VERSION
master1 Ready control-plane,master 19m v1.20.6
node1 Ready <none> 17m v1.20.6
node2 Ready <none> 16m v1.20.6
node3 Ready <none> 3m33s v1.20.6
标记节点不可调度
kubectl cordon node3
驱逐这个node节点上的pod
[root@master1 ~]# kubectl drain node3 --delete-local-data --force --ignore-daemonsets
Flag --delete-local-data has been deprecated, This option is deprecated and will be deleted. Use --delete-emptydir-data.
node/node2 cordoned
WARNING: ignoring DaemonSet-managed Pods: kube-system/calico-node-xv5cp, kube-system/kube-proxy-b9mt5
evicting pod kube-system/coredns-7f89b7bc75-4tlt6
evicting pod default/nginx-deployment-5d47ff8589-fd68t
evicting pod default/nginx-deployment-5d47ff8589-kkjtv
evicting pod default/nginx-deployment-5d47ff8589-klqfc
evicting pod default/nginx-deployment-5d47ff8589-lwmn9
pod/coredns-7f89b7bc75-4tlt6 evicted
pod/nginx-deployment-5d47ff8589-fd68t evicted
pod/nginx-deployment-5d47ff8589-kkjtv evicted
pod/nginx-deployment-5d47ff8589-klqfc evicted
pod/nginx-deployment-5d47ff8589-lwmn9 evicted
node/node2 evicted
删除这个node节点
[root@master1 ~]# kubectl delete nodes node3
node "node3" deleted
然后在node3这个节点上执行如下命令:
[root@node3 ~]# kubeadm reset
[reset] WARNING: Changes made to this host by 'kubeadm init' or 'kubeadm join' will be reverted.
[reset] Are you sure you want to proceed? [y/N]: y
恢复node¶
安装 k8s 集群-添加第三个工作节点
[root@master1 ~]# kubeadm token create --print-join-command
kubeadm join 192.168.1.26:6443 --token vrx60x.1sq6s9g752fe1ufr --discovery-token-ca-cert-hash sha256:9495462d474420d5e4ee3b39bb8a258997f7dfb9d76926baa4aaeaba167b436d
在node2节点执行如上命令
看到下面说明 node1 节点已经加入到集群了,充当工作节点
This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.
Run 'kubectl get nodes' on the control-plane to see this node join the cluster.
在 master2 上查看集群节点状况:
[root@master1 ~]# kubectl get node
NAME STATUS ROLES AGE VERSION
master1 Ready control-plane,master 20h v1.20.6
master2 Ready control-plane,master 20h v1.20.6
node1 Ready <none> 20h v1.20.6
node2 Ready <none> 3s v1.20.6
安装dashboard¶
准备安装 dashboard 网络插件 yaml文件
[root@master1 ~]# yum -y install lrzsz
[root@master1 ~]# rz kubernetes-dashboard.yaml
使用 yaml 文件安装dashboard网络插件
kubectl apply -f kubernetes-dashboard.yaml
检查服务启动状态(要等到所有容器都启动后就代表部署完毕)
[root@master1 ~]# kubectl get pod -ALL|grep dashboard
kubernetes-dashboard dashboard-metrics-scraper-7445d59dfd-f2sr5 1/1 Running 0 63s
kubernetes-dashboard kubernetes-dashboard-54f5b6dc4b-sm5dh 1/1 Running 0 63s
我们需要将启动的pod的service模式修改为**NodePort**模式
[root@master1 ~]# kubectl edit svc kubernetes-dashboard -n kubernetes-dashboard
将:
type: ClusterIP
修改为:
type: NodePort
查看dashbord对外暴露的端口
[root@master1 ~]# kubectl get service -ALL|grep kubernetes-dashboard|grep NodePort
kubernetes-dashboard kubernetes-dashboard NodePort 10.102.208.151 <none> 443:32325/TCP 4m50s
打开浏览器访问 https://192.168.1.26:32325/
登录dashboard(方法1)¶
通过 token 令牌访问 dashboard
通过 Token 登陆 dashboard 创建管理员 token,具有查看任何空间的权限,可以管理所有资源对象
[root@master1 ~]# kubectl create clusterrolebinding dashboard-cluster-admin --clusterrole=cluster-admin --serviceaccount=kubernetes-dashboard:kubernetes-dashboard
clusterrolebinding.rbac.authorization.k8s.io/dashboard-cluster-admin created
查看 kubernetes-dashboard 名称空间下的 secret
[root@master1 ~]# kubectl get secret -n kubernetes-dashboard|grep dashboard-token
kubernetes-dashboard-token-nq97t kubernetes.io/service-account-token 3 23m
找到对应的带有 token 的 kubernetes-dashboard-token-ppc8c
[root@master1 ~]# kubectl describe secret kubernetes-dashboard-token-nq97t -n kubernetes-dashboard
Name: kubernetes-dashboard-token-nq97t
Namespace: kubernetes-dashboard
Labels: <none>
Annotations: kubernetes.io/service-account.name: kubernetes-dashboard
kubernetes.io/service-account.uid: 8fbc2c9a-c1b5-419a-8716-80433932ff47
Type: kubernetes.io/service-account-token
Data
====
ca.crt: 1066 bytes
namespace: 20 bytes
token: eyJhbGciOiJSUzI1NiIsImtpZCI6IlUxdVYxYVpBMkx5N0dJUGZJTElINV8tNG5tUlZsMVExNllXTXBBVUJsUVEifQ.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJrdWJlcm5ldGVzLWRhc2hib2FyZCIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJrdWJlcm5ldGVzLWRhc2hib2FyZC10b2tlbi1ucTk3dCIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VydmljZS1hY2NvdW50Lm5hbWUiOiJrdWJlcm5ldGVzLWRhc2hib2FyZCIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VydmljZS1hY2NvdW50LnVpZCI6IjhmYmMyYzlhLWMxYjUtNDE5YS04NzE2LTgwNDMzOTMyZmY0NyIsInN1YiI6InN5c3RlbTpzZXJ2aWNlYWNjb3VudDprdWJlcm5ldGVzLWRhc2hib2FyZDprdWJlcm5ldGVzLWRhc2hib2FyZCJ9.EPWqeTaWDHa0kR8SihnfxhlBKOpaLQFiaC-jLu_vC5Vy6Bf3VN9mZunRlJEcANUAJH8j2MO9150g9DrCd43hSQIlRIKdsX81i2vESTRHdK1zKXvTZgRuhEAf9Wmni7mbpxABbHxkaUkxEhty8n41O19h17Yo87cGRu5I3oDjmI9n7B1z5z11zVm8DTsSUCzYllFWyVLLBahH-rT0qloizSPZmr3OCU0aqSNiEzRg8X4rZUnor06xrPGXGmbvYwsYWbAPnXeca-k1dMpcjpdyYqsiatyXCOLke1a3ppaEYwRUv3vZ6ofvZCKXJY4Z_e3mwdrcxWkEHtnDpTsUHxlSYg
记住 token 后面的值,把下面的 token 值复制到浏览器 token 登陆处即可登陆:
然后就可以看到dashbord界面啦
登录dashboard(方法2)¶
通过 kubeconfig 文件访问 dashboard
切换到相关目录
cd /etc/kubernetes/pki
创建 cluster 集群
kubectl config set-cluster kubernetes --certificate-authority=./ca.crt --server="https://192.168.1.26:6443" --embed-certs=true --kubeconfig=/root/dashboard-admin.conf
查看
[root@master1 pki]# kubectl get secret -ALL|grep dashboard-token
kubernetes-dashboard kubernetes-dashboard-token-nq97t kubernetes.io/service-account-token 3 30m
创建 credentials 需要使用上面的 kubernetes-dashboard-token-ppc8c 对应的 token 信息
DEF_NS_ADMIN_TOKEN=$(kubectl get secret kubernetes-dashboard-token-nq97t -n kubernetes-dashboard -o jsonpath={.data.token}|base64 -d)
kubectl config set-credentials dashboard-admin --token=$DEF_NS_ADMIN_TOKEN --kubeconfig=/root/dashboard-admin.conf
创建 context
kubectl config set-context dashboard-admin@kubernetes --cluster=kubernetes --user=dashboard-admin --kubeconfig=/root/dashboard-admin.conf
切换 context 的 current-context 是 dashboard-admin@kubernetes
kubectl config use-context dashboard-admin@kubernetes --kubeconfig=/root/dashboard-admin.conf
把刚才的 kubeconfig 文件 dashboard-admin.conf 复制到桌面
sz -y /root/dashboard-admin.conf
浏览器访问时使用 kubeconfig 认证,把刚才的 dashboard-admin.conf 导入到 web 界面,
那么就可以登陆了
使用dashboard¶
使用dashboard创建pod容器
我们点击加号开始创建pod
等待pod创建的结果(创建成功)
当然也可以使用命令行查看
[root@master1 pki]# kubectl get pod
NAME READY STATUS RESTARTS AGE
net-test 1/1 Running 0 3h4m
nginx-pod 1/1 Running 0 3h1m
nginx01-55c8d4f7cd-ddkl5 1/1 Running 0 65s
安装metrics¶
安装 metrics-server 组件 metrics-server 是一个集群范围内的资源数据集和工具,同样的,metrics-server 也只是显示数据,并不提供数据存储服务,主要关注的是资源度量 API 的实现,比如 CPU、文件描述符、内存、请求延时等指标,metric-server 收集数据给 k8s 集群内使用,如 kubectl,hpa,scheduler 等
在/etc/kubernetes/manifests 里面改一下 apiserver 的配置
[root@master1 ~]# vim /etc/kubernetes/manifests/kube-apiserver.yaml
在如下内容
spec:
containers:
- command:
- kube-apiserver
增加如下内容:
- --enable-aggregator-routing=true
注意:这个是 k8s 在 1.17 的新特性,如果是 1.16 版本的可以不用添加,1.17 以后要添加。这个参
数的作用是 Aggregation 允许在不修改 Kubernetes 核心代码的同时扩展 Kubernetes API。
重新更新 apiserver 配置:
[root@master1 ~]# kubectl apply -f /etc/kubernetes/manifests/kube-apiserver.yaml
pod/kube-apiserver created
检查更新状态
[root@master1 ~]# kubectl get pods -n kube-system|grep apiserver
kube-apiserver 0/1 CrashLoopBackOff 1 22s
kube-apiserver-master1 1/1 Running 0 45s
等到 kube-apiserver-master1 pod 运行起来后 把 kube-apiserver CrashLoopBackOff 状态的 pod 删除
[root@master1 ~]# kubectl delete pods kube-apiserver -n kube-system
pod "kube-apiserver" deleted
上传metrics相关的镜像文件,必须要上传导入到所有节点
docker load -i addon.tar.gz
docker load -i metrics-server-amd64-0-3-6.tar.gz
scp addon.tar.gz metrics-server-amd64-0-3-6.tar.gz root@node1:/root/
ssh node1 "docker load -i addon.tar.gz ; docker load -i metrics-server-amd64-0-3-6.tar.gz"
scp addon.tar.gz metrics-server-amd64-0-3-6.tar.gz root@node2:/root/
ssh node2 "docker load -i addon.tar.gz ; docker load -i metrics-server-amd64-0-3-6.tar.gz"
scp addon.tar.gz metrics-server-amd64-0-3-6.tar.gz root@node3:/root/
ssh node3 "docker load -i addon.tar.gz ; docker load -i metrics-server-amd64-0-3-6.tar.gz"
应用metrics yaml文件
[root@master1 ~]# rz metrics.yaml
[root@master1 ~]# kubectl apply -f metrics.yaml
检查metrics pod启动状态
[root@master1 ~]# kubectl get pods -n kube-system | grep metrics
metrics-server-6595f875d6-clxp6 2/2 Running 0 6s
测试kubectl top¶
查看pod的使用容量
[root@master1 ~]# kubectl top pods -n kube-system
NAME CPU(cores) MEMORY(bytes)
calico-kube-controllers-6949477b58-4f66x 1m 19Mi
calico-node-gxh7k 22m 93Mi
calico-node-jk7wv 21m 95Mi
calico-node-qlwsj 20m 96Mi
coredns-7f89b7bc75-bxwt5 2m 15Mi
coredns-7f89b7bc75-wl9pc 2m 17Mi
etcd-master1 9m 65Mi
kube-apiserver-master1 37m 429Mi
kube-controller-manager-master1 7m 53Mi
kube-proxy-4j24n 1m 17Mi
kube-proxy-7m8j7 1m 19Mi
kube-proxy-v9wsc 1m 16Mi
kube-scheduler-master1 3m 24Mi
metrics-server-6595f875d6-clxp6 75m 15Mi
查看集群node节点的使用容量
[root@master1 ~]# kubectl top nodes
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
master1 124m 1% 1795Mi 5%
node1 75m 0% 1165Mi 3%
node2 75m 0% 1212Mi 3%
修改schedule绑定的端口¶
把 scheduler、controller-manager 端口变成物理机可以监听的端口
[root@master1 ~]# kubectl get cs
Warning: v1 ComponentStatus is deprecated in v1.19+
NAME STATUS MESSAGE ERROR
scheduler Unhealthy Get "http://127.0.0.1:10251/healthz": dial tcp 127.0.0.1:10251: connect: connection refused
controller-manager Unhealthy Get "http://127.0.0.1:10252/healthz": dial tcp 127.0.0.1:10252: connect: connection refused
etcd-0 Healthy {"health":"true"}
默认在 1.19 之后 10252 和 10251 都是绑定在 127 的,
如果想要通过 prometheus 监控,会采集不到数据,所以可以把端口绑定到物理机
可按如下方法处理:
[root@master1 ~]# vim /etc/kubernetes/manifests/kube-scheduler.yaml
修改如下内容:
把--bind-address=127.0.0.1 变成--bind-address=192.168.1.26
把 httpGet:字段下的 hosts 由 127.0.0.1 变成 192.168.1.26
把—port=0 删除
[root@master1 ~]# vim /etc/kubernetes/manifests/kube-controller-manager.yaml
把--bind-address=127.0.0.1 变成--bind-address=192.168.1.26
把 httpGet:字段下的 hosts 由 127.0.0.1 变成 192.168.1.26
把—port=0 删除
#注意:192.168.1.26 是 k8s 的控制节点 master1 的 ip
修改之后在 k8s 各个节点重启下 kubelet
# 所有节点都要执行
systemctl restart kubelet
可以看到相应的端口已经被物理机监听了
[root@master1 ~]# kubectl get cs
Warning: v1 ComponentStatus is deprecated in v1.19+
NAME STATUS MESSAGE ERROR
scheduler Healthy ok
controller-manager Healthy ok
etcd-0 Healthy {"health":"true"}
[root@master1 ~]# ss -ltnp|grep 10251
LISTEN 0 128 :::10251 :::* users:(("kube-scheduler",pid=80467,fd=7))
[root@master1 ~]# ss -ltnp|grep 10252
LISTEN 0 128 :::10252 :::* users:(("kube-controller",pid=80051,fd=7))
延长证书时间¶
查看证书有效时间:
[root@master1 ~]# openssl x509 -in /etc/kubernetes/pki/ca.crt -noout -text |grep Not
Not Before: Apr 18 11:42:04 2022 GMT
Not After : Apr 15 11:42:04 2032 GMT
通过上面可看到ca证书有效期是10年,从2022到2032年
[root@master1 ~]# openssl x509 -in /etc/kubernetes/pki/apiserver.crt -noout -text |grep Not
Not Before: Apr 18 11:42:04 2022 GMT
Not After : Apr 19 06:35:35 2023 GMT
通过上面可看到apiserver证书有效期是1年,从2022到2023年:
延长证书过期时间 1.把update-kubeadm-cert.sh文件上传到master1
[root@master1 ~]# ll /root/update-kubeadm-cert.sh
-rw------- 1 root root 10756 Apr 10 15:09 /root/update-kubeadm-cert.sh
scp update-kubeadm-cert.sh master2:/root/
scp update-kubeadm-cert.sh master3:/root/
2.在每个节点都执行如下命令 1)给update-kubeadm-cert.sh证书授权可执行权限
[root@master1 ~]# chmod +x /root/update-kubeadm-cert.sh
2)执行下面命令,修改证书过期时间,把时间延长到10年
[root@master1 ~]# ./update-kubeadm-cert.sh all
3)在master1节点查询Pod是否正常,能查询出数据说明证书签发完成
[root@master1 ~]# kubectl get pods
NAME READY STATUS RESTARTS AGE
net-test 1/1 Running 0 19h
nginx-deployment-5d47ff8589-fd68t 1/1 Running 0 19h
能够看到pod信息,说明证书签发正常
验证证书有效时间是否延长到10年
[root@master1 ~]# openssl x509 -in /etc/kubernetes/pki/apiserver.crt -noout -text |grep Not
Not Before: Apr 19 07:42:41 2022 GMT
Not After : Apr 16 07:42:41 2032 GMT
通过上面可看到apiserver证书有效期是10年,从2022到2032年:
[root@master1 ~]# openssl x509 -in /etc/kubernetes/pki/apiserver-etcd-client.crt -noout -text |grep Not
Not Before: Apr 19 07:42:40 2022 GMT
Not After : Apr 16 07:42:40 2032 GMT
通过上面可看到etcd证书有效期是10年,从2022到2032年:
[root@master1 ~]# openssl x509 -in /etc/kubernetes/pki/front-proxy-ca.crt -noout -text |grep Not
Not Before: Apr 18 11:42:04 2022 GMT
Not After : Apr 15 11:42:04 2032 GMT
通过上面可看到fron-proxy证书有效期是10年,从2022到2032年
修改node主机名称¶
如果我们想修改node的主机名称的话,我们需要将node进行缩容,在从新加入集群,方可修改node的主机名称
比如我们将node3 修改为 node-192e168e1e29
先查看一下这个node节点上的pod信息
[root@master1 ~]# kubectl get node
NAME STATUS ROLES AGE VERSION
master1 Ready control-plane,master 19m v1.20.6
node1 Ready <none> 17m v1.20.6
node2 Ready <none> 16m v1.20.6
node3 Ready <none> 3m33s v1.20.6
标记节点不可调度
kubectl cordon node3
驱逐这个node节点上的pod
[root@master1 ~]# kubectl drain node3 --delete-local-data --force --ignore-daemonsets
Flag --delete-local-data has been deprecated, This option is deprecated and will be deleted. Use --delete-emptydir-data.
node/node2 cordoned
WARNING: ignoring DaemonSet-managed Pods: kube-system/calico-node-xv5cp, kube-system/kube-proxy-b9mt5
evicting pod kube-system/coredns-7f89b7bc75-4tlt6
evicting pod default/nginx-deployment-5d47ff8589-fd68t
evicting pod default/nginx-deployment-5d47ff8589-kkjtv
evicting pod default/nginx-deployment-5d47ff8589-klqfc
evicting pod default/nginx-deployment-5d47ff8589-lwmn9
pod/coredns-7f89b7bc75-4tlt6 evicted
pod/nginx-deployment-5d47ff8589-fd68t evicted
pod/nginx-deployment-5d47ff8589-kkjtv evicted
pod/nginx-deployment-5d47ff8589-klqfc evicted
pod/nginx-deployment-5d47ff8589-lwmn9 evicted
node/node2 evicted
删除这个node节点
[root@master1 ~]# kubectl delete nodes node3
node "node3" deleted
然后在node3这个节点上执行如下命令:
[root@node3 ~]# kubeadm reset
[reset] WARNING: Changes made to this host by 'kubeadm init' or 'kubeadm join' will be reverted.
[reset] Are you sure you want to proceed? [y/N]: y
现在我们开始修改node3的主机名称为
hostnamectl set-hostname node-192e168e1e29 && bash
修改master节点的/etc/hosts文件解析
[root@master1 ~]# cat /etc/hosts
192.168.1.26 master1
192.168.1.27 node1
192.168.1.28 node2
192.168.1.29 node-192e168e1e29
安装 k8s 集群-添加工作节点
[root@master1 ~]# kubeadm token create --print-join-command
kubeadm join 192.168.1.26:6443 --token vrx60x.1sq6s9g752fe1ufr --discovery-token-ca-cert-hash sha256:9495462d474420d5e4ee3b39bb8a258997f7dfb9d76926baa4aaeaba167b436d
在node2节点执行如上命令
看到下面说明 node-192e168e1e29 节点已经加入到集群了,充当工作节点
This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.
Run 'kubectl get nodes' on the control-plane to see this node join the cluster.
在 master1上查看集群节点状况:
[root@master1 ~]# kubectl get node
NAME STATUS ROLES AGE VERSION
master1 Ready control-plane,master 2d19h v1.20.6
node-192e168e1e29 Ready <none> 18s v1.20.6
node1 Ready <none> 2d19h v1.20.6
node2 Ready <none> 4h33m v1.20.6
如果想恢复的话,按照以上部署恢复即可,即可恢复哦
修改node角色名称¶
我们可以设置node节点的角色名称为 work
kubectl get nodes --show-labels
[root@master1 ~]# kubectl label nodes node3 node-role.kubernetes.io/work=
node/node3 labeled
[root@master1 ~]# kubectl get node
NAME STATUS ROLES AGE VERSION
master1 Ready control-plane,master 2d21h v1.20.6
node1 Ready <none> 2d21h v1.20.6
node2 Ready <none> 6h50m v1.20.6
node3 Ready node 133m v1.20.6
当然我们也可以取消角色名称
[root@master1 ~]# kubectl label nodes node3 node-role.kubernetes.io/work-
node/node3 labeled
[root@master1 ~]# kubectl get node
NAME STATUS ROLES AGE VERSION
master1 Ready control-plane,master 2d21h v1.20.6
node1 Ready <none> 2d21h v1.20.6
node2 Ready <none> 6h50m v1.20.6
node3 Ready <none> 134m v1.20.6
node节点下线维修¶
https://www.csdn.net/tags/MtTaEg2sMjU5OTY0LWJsb2cO0O0O.html
场景:k8s集群中的node节点在正常的情况下,需要进行停机维修
首先我们先查看一下当前集群的节点
[root@master1 ~]# kubectl get node
NAME STATUS ROLES AGE VERSION
master1 Ready control-plane,master 2d19h v1.20.6
node1 Ready <none> 2d19h v1.20.6
node2 Ready <none> 4h36m v1.20.6
node3 Ready <none> 30s v1.20.6
我们来模拟node2节点,需要计划性的下线维修,首先标记节点不可调度
kubectl cordon node2
查看一个node2节点的状态会变化为 SchedulingDisabled 如下:
[root@master1 ~]# kubectl get node
NAME STATUS ROLES AGE VERSION
master1 Ready control-plane,master 2d19h v1.20.6
node1 Ready <none> 2d19h v1.20.6
node2 Ready,SchedulingDisabled <none> 4h46m v1.20.6
node3 Ready <none> 9m50s v1.20.6
接着我们需要将node2节点上的pod进行驱逐
[root@master1 ~]# kubectl drain node2 --delete-local-data --force --ignore-daemonsets
Flag --delete-local-data has been deprecated, This option is deprecated and will be deleted. Use --delete-emptydir-data.
node/node2 cordoned
WARNING: ignoring DaemonSet-managed Pods: kube-system/calico-node-xv5cp, kube-system/kube-proxy-b9mt5
evicting pod kube-system/coredns-7f89b7bc75-4tlt6
evicting pod default/nginx-deployment-5d47ff8589-fd68t
evicting pod default/nginx-deployment-5d47ff8589-kkjtv
evicting pod default/nginx-deployment-5d47ff8589-klqfc
evicting pod default/nginx-deployment-5d47ff8589-lwmn9
pod/coredns-7f89b7bc75-4tlt6 evicted
pod/nginx-deployment-5d47ff8589-fd68t evicted
pod/nginx-deployment-5d47ff8589-kkjtv evicted
pod/nginx-deployment-5d47ff8589-klqfc evicted
pod/nginx-deployment-5d47ff8589-lwmn9 evicted
node/node2 evicted
参数如下:
--delete-local-data 删除本地数据,即使emptyDir也将删除;
--ignore-daemonsets 忽略DeamonSet,否则DeamonSet被删除后,仍会自动重建;
--force 不加force参数只会删除该node节点上的ReplicationController, ReplicaSet, DaemonSet,StatefulSet or Job,加上后所有pod都将删除;
此时与默认迁移不同的是,pod会先重建再终止
,此时的**服务中断时间=重建时间+服务启动时间+readiness探针检测正常时间**,必须等到1/1 Running
服务才会正常。因此在单副本时迁移时,服务终端是不可避免的。
然后我们将服务器进行关机,等待服务器修好以后,再次开机,讲node节点恢复为可以调度的状态
kubectl uncordon node2
最终的集群状态如下:
[root@master1 ~]# kubectl get node
NAME STATUS ROLES AGE VERSION
master1 Ready control-plane,master 2d19h v1.20.6
node1 Ready <none> 2d19h v1.20.6
node2 Ready <none> 4h54m v1.20.6
node3 Ready <none> 18m v1.20.6
node节点故障疏散¶
场景: 一个k8s集群 突然1个node节点 直接停电down机啦 然后这个node系统直接损坏,无法启动系统, 他上面的node上的pod 这时应该怎么恢复呢 或者调度到其他node上呢
我们将node3节点进行直接关机,模拟服务器突然down机
[root@node3 ~]# poweroff
我们可以立马看到node3上的pod网络是不通的
[root@master1 ~]# kubectl get pod -o wide|grep node3
nginx-deployment-5d47ff8589-288mj 1/1 Running 0 8m48s 10.244.135.28 node3 <none> <none>
nginx-deployment-5d47ff8589-2lnch 1/1 Running 0 8m38s 10.244.135.77 node3 <none> <none>
nginx-deployment-5d47ff8589-2pjf9 1/1 Running 0 8m22s 10.244.135.156 node3 <none> <none>
[root@master1 ~]# ping -c 4 10.244.135.28
PING 10.244.135.28 (10.244.135.28) 56(84) bytes of data.
--- 10.244.135.28 ping statistics ---
4 packets transmitted, 0 received, 100% packet loss, time 2999ms
等待一段时间后,我们可以看到node3节点上pod的状态会变成 Terminating
[root@master1 ~]# kubectl get pod -o wide|grep node3|head -3
nginx-deployment-5d47ff8589-288mj 1/1 Terminating 0 58m 10.244.135.28 node3 <none> <none>
nginx-deployment-5d47ff8589-2lnch 1/1 Terminating 0 57m 10.244.135.77 node3 <none> <none>
nginx-deployment-5d47ff8589-2pjf9 1/1 Terminating 0 57m 10.244.135.156 node3 <none> <none>
如果此时我们已经确认node3节点已经属于不可恢复的节点
我们开始清理node3节点上异常的pod
[root@master1 ~]# cat clean_pod.sh
#!/bin/bash
node_list="node2"
for n in "${node_list}"
do
fail_pod_count=$(kubectl get pod -o wide -ALL |grep " ${n} "|grep -v kube-system|wc -l)
for m in `seq 1 $fail_pod_count`
do
fail_pod_name=$(kubectl get pod -o wide -ALL |grep " ${n} "|grep -v kube-system|awk 'NR=='$m'{print $2}')
fail_pod_namespace=$(kubectl get pod -o wide -ALL |grep " ${n} "|grep -v kube-system|awk 'NR=='$m'{print $1}')
echo "kubectl delete pod $fail_pod_name -n $fail_pod_namespace --force --grace-period=0"
sleep 0.5
done
done
将打印出来的命令进行执行执行即可
[root@master1 ~]# kubectl get pod -o wide|grep node3
最终清理完毕pod
删除这个node节点
[root@master1 ~]# kubectl delete nodes node3
node "node3" deleted
每当删除namespace或pod 等一些Kubernetes资源时,有时资源状态会卡在terminating,很长时间无法删除,甚至有时增加--force flag(强制删除)之后还是无法正常删除。这时就需要edit该资源,将字段finalizers设置为null,之后Kubernetes资源就正常删除了。
当删除pod时有时会卡住,pod状态变为terminating,无法删除pod
(1)强制删除
kubectl delete pod xxx -n xxx --force --grace-period=0
(2)如果强制删除还不行,设置finalizers为空
(如果一个容器已经在运行,这时需要对一些容器属性进行修改,又不想删除容器,或不方便通过replace的方式进行更新。kubernetes还提供了一种在容器运行时,直接对容器进行修改的方式,就是patch命令。)
kubectl patch pod xxx -n xxx -p '{"metadata":{"finalizers":null}}'
node节点限制pod¶
https://blog.51cto.com/zhangxueliang/2969910
k8s 更改pod数量限制(默认每个节点最多110组pod)0/3 nodes are available: 3 Insufficient cpu报错排查
我们目前有3个node节点,也就是说最多不能创建超过330个pod,现在我们将deployment-nginx的pod扩容到350个 看下效果
kubectl scale deployment nginx-deployment --replicas 350
我们可以看到很多pod的状态为Pending模式,这个就表示每个节点的pod已经有110个啦,不能在继续增加
[root@master1 ~]# kubectl get deployment
NAME READY UP-TO-DATE AVAILABLE AGE
nginx-deployment 322/350 350 322 2d18h
那些没有创建出来的pod 一直处于Pending状态
[root@master1 pod]# kubectl get pod -ALL |grep Pending
default nginx-deployment-5d47ff8589-2jgmn 0/1 Pending 0 3m1s
default nginx-deployment-5d47ff8589-2vctp 0/1 Pending 0 3m2s
default nginx-deployment-5d47ff8589-5nqgl 0/1 Pending 0 3m2s
我们可以通过修改来配置node的pod限制
cat >/etc/sysconfig/kubelet<<\EOF
KUBELET_EXTRA_ARGS="--fail-swap-on=false --max-pods=1000"
EOF
[root@node2 ~]# vi /usr/lib/systemd/system/kubelet.service
[Service]
EnvironmentFile=-/etc/sysconfig/kubelet
systemctl daemon-reload
systemctl restart kubelet
我们可以看到那些处于Pending状态的pod会被陆续创建出来
[root@master1 ~]# kubectl get deployment
NAME READY UP-TO-DATE AVAILABLE AGE
nginx-deployment 350/350 350 350 2d18h
如果我们想恢复恢复默认pod限制的方法
cat >/etc/sysconfig/kubelet<<\EOF
KUBELET_EXTRA_ARGS=
EOF
systemctl daemon-reload
systemctl restart kubelet
systemctl status kubelet