kube-vip 使用了 Linux Virtual Server (LVS) 技术来实现负载均衡和高可用性。它通过在每个节点上运行一个虚拟 IP 地址,并使用 LVS 转发流量到控制平面节点来实现负载均衡。如果某个节点故障,kube-vip 会自动将虚拟 IP 地址迁移到其他健康的节点上,从而确保集群的高可用性。
使用 kube-vip,您可以在 Kubernetes 集群中实现以下功能:
-
高可用控制平面:kube-vip 使控制平面组件(如 etcd、API Server、Controller Manager 和 Scheduler)具有高可用性,即使某个节点发生故障,集群仍能正常运行。
-
负载均衡:kube-vip 使用 LVS 技术将流量负载均衡到多个控制平面节点上,从而提供更好的性能和可扩展性。
-
滚动升级:在进行 Kubernetes 版本升级或节点维护时,kube-vip 可以自动迁移虚拟 IP 地址,使您能够无缝地进行滚动升级,而无需中断集群的正常运行。
使用场景简介
VIP 172.16.10.54 k8s-master01 172.16.10.50 k8s-master02 172.16.10.51 k8s-master03 172.16.10.52 k8s-node02 172.16.10.53
基础配置
首先,需要对所有的节点做基础配置,关闭防火墙,过滤包,开启ipvs以及时间同步等待,这里需要注意的是modprobe br_netfilter,ipvs是liunx的内核的一个模块,用于实现高性能的负载均衡。它提供了一种在传输层(Layer 4)对网络流量进行负载均衡的方法。开启后重启会失效,避免重启失效可以持久化echo "br_netfilter" > /etc/modprobe.d/modules.conf
关闭防火墙: # systemctl stop firewalld # systemctl disable firewalld 关闭selinux: # sed -i 's/enforcing/disabled/' /etc/selinux/config #永久,setenforce 0 #临时。 关闭swap:swapoff -a #临时,vim /etc/fstab #永久–> c。 将桥接的IPv4流量传递到iptables的链(要在每个机器上执行)过滤网桥上的包。 # cat > /etc/sysctl.d/k8s.conf << EOF net.bridge.bridge-nf-call-ip6tables = 1 net.bridge.bridge-nf-call-iptables = 1 net.ipv4.ip_forward = 1 EOF # modprobe br_netfilter #开启内核 ipv4 转发需要加载 br_netfilter 模块 # sysctl --system #生效 安装ipvs # cat > /etc/sysconfig/modules/ipvs.modules <<EOF #!/bin/bash modprobe -- ip_vs modprobe -- ip_vs_rr modprobe -- ip_vs_wrr modprobe -- ip_vs_sh modprobe -- nf_conntrack_ipv4 EOF # chmod 755 /etc/sysconfig/modules/ipvs.modules && bash /etc/sysconfig/modules/ipvs.modules && lsmod | grep -e ip_vs -e nf_conntrack_ipv4 nf_conntrack_ipv4 19149 0 nf_defrag_ipv4 12729 1 nf_conntrack_ipv4 ip_vs_sh 12688 0 ip_vs_wrr 12697 0 ip_vs_rr 12600 0 ip_vs 145458 6 ip_vs_rr,ip_vs_sh,ip_vs_wrr nf_conntrack 143360 2 ip_vs,nf_conntrack_ipv4 libcrc32c 12644 3 xfs,ip_vs,nf_conntrack # yum install ipset -y # yum install ipvsadm -y 时间同步 # yum install chrony -y # systemctl enable chronyd # systemctl start chronyd # chronyc sources 210 Number of sources = 4 MS Name/IP address Stratum Poll Reach LastRx Last sample =============================================================================== ^- ntp7.flashdance.cx 2 10 333 725 -33ms[ -33ms] +/- 135ms ^- ntp6.flashdance.cx 2 10 73 607 -16ms[ -16ms] +/- 142ms ^+ dns1.synet.edu.cn 2 10 357 402 -360us[ -360us] +/- 8084us ^* dns2.synet.edu.cn 1 10 367 650 +177us[ +222us] +/- 7121us # echo "vm.swappiness = 0" >> /etc/sysctl.d/k8s.conf # sysctl -p /etc/sysctl.d/k8s.conf net.bridge.bridge-nf-call-ip6tables = 1 net.bridge.bridge-nf-call-iptables = 1 net.ipv4.ip_forward = 1 vm.swappiness = 0
安装containerd
基础配置完成后,可以安装containerd,kubernetes的1.20版本不在把docker作为默认支持的容器,这里使用containerd的目的就是为了简单。
安装containerd 使用yum安装 # yum install -y yum-utils device-mapper-persistent-data lvm2 # yum-config-manager --add-repo http://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo #添加yum源 # yum install containerd crictl-tools jq -y # containerd config default > /etc/containerd/config.toml # ctr version Client: Version: 1.6.22 Revision: 8165feabfdfe38c65b599c4993d227328c231fca Go version: go1.19.11 Server: Version: 1.6.22 Revision: 8165feabfdfe38c65b599c4993d227328c231fca UUID: 7fa75843-35f9-4f26-b50a-33ed04e64f26 在https://github.com/containerd/containerd/releases/下载cri-containerd-cni-1.6.22-linux-amd64.tar.gz,然后上传到服务器上,并且解压配置环境变量即可。 # wget https://download.fastgit.org/containerd/containerd/releases/download/v1.6.22/cri-containerd-cni-1.6.22-linux-amd64.tar.gz tar -C / -xzf cri-containerd-cni-1.5.5-linux-amd64.tar.gz echo '''export PATH=$PATH:/usr/local/bin:/usr/local/sbin''' >> /etc/profile source /etc/profile
containerd配置
containerd和docker一样,我们需要修改一下镜像加速器,避免拉取国外镜像超时的问题。
# mkdir -p /etc/containerd # containerd config default > /etc/containerd/config.toml #生成配置文件 配置文件修改 [plugins."io.containerd.grpc.v1.cri"] ...................... sandbox_image = "registry.aliyuncs.com/google_containers/pause:3.6" #添加镜像加速器 ........................ [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options] .................... SystemdCgroup = true #把false改成true [plugins."io.containerd.grpc.v1.cri".registry] config_path = "" [plugins."io.containerd.grpc.v1.cri".registry.auths] [plugins."io.containerd.grpc.v1.cri".registry.configs] [plugins."io.containerd.grpc.v1.cri".registry.headers] [plugins."io.containerd.grpc.v1.cri".registry.mirrors] [plugins."io.containerd.grpc.v1.cri".registry.mirrors."docker.io"] #添加镜像加速器 endpoint = ["https://bqr1dr1n.mirror.aliyuncs.com"] [plugins."io.containerd.grpc.v1.cri".registry.mirrors."k8s.gcr.io"]#添加镜像加速器 endpoint = ["https://registry.aliyuncs.com/k8sxio"] #重启 # systemctl enable --now containerd # systemctl restart containerd # systemctl daemon-reload
安装kubeadm、kubelet、kubectl
kubeadm,kubelet,kubectl三个的版本一定要一致,避免后期的麻烦,这里使用yum安装。并设置开机自启动。
#添加yum源 # cat <<EOF > /etc/yum.repos.d/kubernetes.repo [kubernetes] name=Kubernetes baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64/ enabled=1 gpgcheck=1 repo_gpgcheck=1 gpgkey=https://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg https://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg EOF # yum makecache fast -y # yum install -y --nogpgcheck kubelet kubeadm kubectl #-disableexcludes 禁掉除了kubernetes之外的别的仓库 $ yum install -y kubelet-1.24.1 kubeadm-1.24.1 kubectl-1.24.1 --disableexcludes=kubernetes #可以使用下面的命令查看仓库中软件的版本 $ yum --showduplicates list kubectl #如果已经安装可以使用kubectl version查看版本 $ systemctl start kubelet $ systemctl enable kubelet systemctl enable --now kubelet 开机自启动
以上必须在所有节点上安装,不管是master节点还是node节点都需要用到containerd和kubeadm,kubelet,kubectl
负载均衡器
我们先生成kube-vip的pod模版文件,定义kube-vip的变量。记得每次初始化的时候会把这个yml文件删除,所以在初始化后一定要重新生成kube-vip的模版。
# mkdir -p /etc/kubernetes/manifests/ # export VIP=172.16.10.54 # export INTERFACE=eth0 # ctr image pull docker.io/plndr/kube-vip:v0.6.2 # ctr run --rm --net-host docker.io/plndr/kube-vip:v0.6.2 vip \ /kube-vip manifest pod \ --interface $INTERFACE \ --vip $VIP \ --controlplane \ --services \ --arp \ --leaderElection | tee /etc/kubernetes/manifests/kube-vip.yaml apiVersion: v1 kind: Pod metadata: creationTimestamp: null name: kube-vip namespace: kube-system spec: containers: - args: - manager env: - name: vip_arp value: "true" - name: vip_interface value: eth0 - name: port value: "6443" - name: vip_cidr value: "32" - name: cp_enable value: "true" - name: cp_namespace value: kube-system - name: vip_ddns value: "false" - name: svc_enable value: "true" - name: vip_leaderelection value: "true" - name: vip_leaseduration value: "5" - name: vip_renewdeadline value: "3" - name: vip_retryperiod value: "1" - name: vip_address value: 172.16.10.50 image: ghcr.io/kube-vip/kube-vip:v0.3.8 imagePullPolicy: Always name: kube-vip resources: {} securityContext: capabilities: add: - NET_ADMIN - NET_RAW - SYS_TIME volumeMounts: - mountPath: /etc/kubernetes/admin.conf name: kubeconfig hostNetwork: true volumes: - hostPath: path: /etc/kubernetes/admin.conf name: kubeconfig status: {}
master节点和node节点的区别就是一个有负载均衡一个没有负载均衡,到此,一个master节点的配置安装完了,需要注意的是,第二个master在初始化之前一定要生成kube-vip模版后在加入集群。下面开始初始化集群。
修改初始化配置文件
我们通过kubeadm生成一个初始化文件,并修改配置,这里需要注意在文件中添加的注释,我的这个模版文件使用的是flannel网络插件,如果使用Calico插件只需要将 podSubnet 字段置空podSubnet: ""
初始化master节点 ###################################### # kubeadm config print init-defaults --component-configs KubeletConfiguration > kubeadm.yaml #生产初始化配置文件 # cat kubeadm.yaml apiVersion: kubeadm.k8s.io/v1beta3 bootstrapTokens: - groups: - system:bootstrappers:kubeadm:default-node-token token: abcdef.0123456789abcdef ttl: 24h0m0s usages: - signing - authentication kind: InitConfiguration localAPIEndpoint: advertiseAddress: 172.16.10.50 #当前节点ip bindPort: 6443 nodeRegistration: criSocket: unix:///var/run/containerd/containerd.sock imagePullPolicy: IfNotPresent name: 172.16.10.50 #这里如果是主机名,一定要在host做域名解析 taints: # 给master添加污点,master节点不能调度应用 - effect: "NoSchedule" key: "node-role.kubernetes.io/master" --- apiVersion: kubeproxy.config.k8s.io/v1alpha1 kind: KubeProxyConfiguration mode: ipvs # kube-proxy 模式 --- apiVersion: kubeadm.k8s.io/v1beta3 certificatesDir: /etc/kubernetes/pki clusterName: kubernetes controllerManager: {} dns: {} etcd: local: dataDir: /var/lib/etcd imageRepository: registry.aliyuncs.com/google_containers kind: ClusterConfiguration kubernetesVersion: 1.24.1 #kubernetes版本 controlPlaneEndpoint: 172.16.10.54:6443 # 设置控制平面Endpoint地址 apiServer: extraArgs: authorization-mode: Node,RBAC timeoutForControlPlane: 4m0s certSANs: # 添加其他master节点的相关信息,如果这里定义的服务名称一定要在hosts里做本地解析 - 172.16.10.54 - 172.16.10.50 - 172.16.10.51 networking: dnsDomain: cluster.local serviceSubnet: 10.96.0.0/12 podSubnet: 10.244.0.0/16 # 指定 pod 子网 如果使用Calico插件只需要将 podSubnet 字段置空podSubnet: "" scheduler: {} --- apiVersion: kubelet.config.k8s.io/v1beta1 authentication: anonymous: enabled: false webhook: cacheTTL: 0s enabled: true x509: clientCAFile: /etc/kubernetes/pki/ca.crt authorization: mode: Webhook webhook: cacheAuthorizedTTL: 0s cacheUnauthorizedTTL: 0s cgroupDriver: systemd clusterDNS: - 10.96.0.10 clusterDomain: cluster.local cpuManagerReconcilePeriod: 0s evictionPressureTransitionPeriod: 0s fileCheckFrequency: 0s healthzBindAddress: 127.0.0.1 healthzPort: 10248 httpCheckFrequency: 0s imageMinimumGCAge: 0s kind: KubeletConfiguration logging: flushFrequency: 0 options: json: infoBufferSize: "0" verbosity: 0 memorySwap: {} nodeStatusReportFrequency: 0s nodeStatusUpdateFrequency: 0s rotateCertificates: true runtimeRequestTimeout: 0s shutdownGracePeriod: 0s shutdownGracePeriodCriticalPods: 0s staticPodPath: /etc/kubernetes/manifests streamingConnectionIdleTimeout: 0s syncFrequency: 0s volumeStatsAggPeriod: 0s ###########################################
这里需要注意的是如果在certSANs中配置的是服务名称,vip通过域名请求的方式,一定要在hosts中配置主机名的解析,并且在kubeadm.yaml文件中也要配置主机名,避免解析失败的问题,而且使用主机名后在kubectl get node中NAME下显示的是主机名,如果不是主机名使用ip将会显示ip。
..................... nodeRegistration: criSocket: /run/containerd/containerd.sock # 使用 containerd的Unix socket 地址 imagePullPolicy: IfNotPresent name: k8s-master01 .......................... controlPlaneEndpoint: api.k8s.local:6443 # 设置控制平面Endpoint地址 apiServer: extraArgs: authorization-mode: Node,RBAC timeoutForControlPlane: 4m0s certSANs: # 添加其他master节点的相关信息 - api.k8s.local - k8s-master01 - k8s-master02 - k8s-master03 - 192.168.31.30 - 192.168.31.31 - 192.168.31.32
hosts的解析一定和这里对应,如果不对应在初始化的时候会找不到。
下载初始化需要的镜像文件,直接下载即可。
# kubeadm config images pull --config kubeadm.yaml [config/images] Pulled registry.aliyuncs.com/google_containers/kube-apiserver:v1.24.1 [config/images] Pulled registry.aliyuncs.com/google_containers/kube-controller-manager:v1.24.1 [config/images] Pulled registry.aliyuncs.com/google_containers/kube-scheduler:v1.24.1 [config/images] Pulled registry.aliyuncs.com/google_containers/kube-proxy:v1.24.1 [config/images] Pulled registry.aliyuncs.com/google_containers/pause:3.7 [config/images] Pulled registry.aliyuncs.com/google_containers/etcd:3.5.3-0 [config/images] Pulled registry.aliyuncs.com/google_containers/coredns:v1.8.6
初始化集群
根据我们配置的yaml文件进行初始化。
# kubeadm init --upload-certs --config kubeadm.yaml [init] Using Kubernetes version: v1.24.1 [preflight] Running pre-flight checks [preflight] Pulling images required for setting up a Kubernetes cluster [preflight] This might take a minute or two, depending on the speed of your internet connection [preflight] You can also perform this action in beforehand using 'kubeadm config images pull' [certs] Using certificateDir folder "/etc/kubernetes/pki" [certs] Generating "ca" certificate and key [certs] Generating "apiserver" certificate and key [certs] apiserver serving cert is signed for DNS names [172.16.10.50 kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 172.16.10.50 172.16.10.54 172.16.10.51] [certs] Generating "apiserver-kubelet-client" certificate and key [certs] Generating "front-proxy-ca" certificate and key [certs] Generating "front-proxy-client" certificate and key [certs] Generating "etcd/ca" certificate and key [certs] Generating "etcd/server" certificate and key [certs] etcd/server serving cert is signed for DNS names [172.16.10.50 localhost] and IPs [172.16.10.50 127.0.0.1 ::1] [certs] Generating "etcd/peer" certificate and key [certs] etcd/peer serving cert is signed for DNS names [172.16.10.50 localhost] and IPs [172.16.10.50 127.0.0.1 ::1] [certs] Generating "etcd/healthcheck-client" certificate and key [certs] Generating "apiserver-etcd-client" certificate and key [certs] Generating "sa" key and public key [kubeconfig] Using kubeconfig folder "/etc/kubernetes" [kubeconfig] Writing "admin.conf" kubeconfig file [kubeconfig] Writing "kubelet.conf" kubeconfig file [kubeconfig] Writing "controller-manager.conf" kubeconfig file [kubeconfig] Writing "scheduler.conf" kubeconfig file [kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env" [kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml" [kubelet-start] Starting the kubelet [control-plane] Using manifest folder "/etc/kubernetes/manifests" [control-plane] Creating static Pod manifest for "kube-apiserver" [control-plane] Creating static Pod manifest for "kube-controller-manager" [control-plane] Creating static Pod manifest for "kube-scheduler" [etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests" [wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s [apiclient] All control plane components are healthy after 10.011741 seconds [upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace [kubelet] Creating a ConfigMap "kubelet-config" in namespace kube-system with the configuration for the kubelets in the cluster [upload-certs] Storing the certificates in Secret "kubeadm-certs" in the "kube-system" Namespace [upload-certs] Using certificate key: 3a356484fecaaea190f52c359c6182e08297f742dd1cda3fd8054b8b0558c08c [mark-control-plane] Marking the node 172.16.10.50 as control-plane by adding the labels: [node-role.kubernetes.io/control-plane node.kubernetes.io/exclude-from-external-load-balancers] [mark-control-plane] Marking the node 172.16.10.50 as control-plane by adding the taints [node-role.kubernetes.io/master:NoSchedule] [bootstrap-token] Using token: abcdef.0123456789abcdef [bootstrap-token] Configuring bootstrap tokens, cluster-info ConfigMap, RBAC Roles [bootstrap-token] Configured RBAC rules to allow Node Bootstrap tokens to get nodes [bootstrap-token] Configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials [bootstrap-token] Configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token [bootstrap-token] Configured RBAC rules to allow certificate rotation for all node client certificates in the cluster [bootstrap-token] Creating the "cluster-info" ConfigMap in the "kube-public" namespace [kubelet-finalize] Updating "/etc/kubernetes/kubelet.conf" to point to a rotatable kubelet client certificate and key [addons] Applied essential addon: CoreDNS [addons] Applied essential addon: kube-proxy Your Kubernetes control-plane has initialized successfully! To start using your cluster, you need to run the following as a regular user: mkdir -p $HOME/.kube sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config sudo chown $(id -u):$(id -g) $HOME/.kube/config Alternatively, if you are the root user, you can run: export KUBECONFIG=/etc/kubernetes/admin.conf You should now deploy a pod network to the cluster. Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at: https://kubernetes.io/docs/concepts/cluster-administration/addons/ You can now join any number of the control-plane node running the following command on each as root: #添加master节点 kubeadm join 172.16.10.54:6443 --token abcdef.0123456789abcdef \ --discovery-token-ca-cert-hash sha256:c656da07f1a79168d392a4c59807a89293df04a514736a2bea147425a1b59408 \ --control-plane --certificate-key 3a356484fecaaea190f52c359c6182e08297f742dd1cda3fd8054b8b0558c08c Please note that the certificate-key gives access to cluster sensitive data, keep it secret! As a safeguard, uploaded-certs will be deleted in two hours; If necessary, you can use "kubeadm init phase upload-certs --upload-certs" to reload certs afterward. Then you can join any number of worker nodes by running the following on each as root: #添加node节点 kubeadm join 172.16.10.54:6443 --token abcdef.0123456789abcdef \ --discovery-token-ca-cert-hash sha256:c656da07f1a79168d392a4c59807a89293df04a514736a2bea147425a1b59408
初始化成功,根据提示创建执行即可,那么在本节点上添加密钥,就可以看到kubernetes集群了。
# mkdir -p $HOME/.kube # sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config # sudo chown $(id -u):$(id -g) $HOME/.kube/config # kubectl get node NAME STATUS ROLES AGE VERSION 172.16.10.50 Ready control-plane 175m v1.24.1
这是因为我在初始化时定义的使用ip地址,这里就显示ip地址了。
添加master集群节点
因为是高可用,所以需要在k8s-master02
节点加入集群,可以通过上面初始化生成的密钥添加,也可以通过命令的方式添加,不过需要注意的是在加入集群时会解析本地的主机名,提示找不到,可以在hosts中指定k8s-master02解析到172。16.10.51这个ip,也就是当前的ip,然后加入集群。如果初始化集群生成的密钥超过了24小时,可以重新创建密钥,然后拼接
# kubeadm token create --print-join-command --ttl=0 kubeadm join 172.16.10.54:6443 --token nmw4yn.5dv52o9s8gcrzip5 --discovery-token-ca-cert-hash sha256:c656da07f1a79168d392a4c59807a89293df04a514736a2bea147425a1b59408 # kubeadm init phase upload-certs --upload-certs I0907 12:10:57.554477 28856 version.go:255] remote version is much newer: v1.28.1; falling back to: stable-1.24 [upload-certs] Storing the certificates in Secret "kubeadm-certs" in the "kube-system" Namespace [upload-certs] Using certificate key: 1de3f4baf520e102ef1836e3e774edf46c0ebcba047f475931344ffc9b9bcbd0 拼接成 # kubeadm join 172.16.10.54:6443 --token nmw4yn.5dv52o9s8gcrzip5 --discovery-token-ca-cert-hash sha256:c656da07f1a79168d392a4c59807a89293df04a514736a2bea147425a1b59408 --control-plane --certificate-key 1de3f4baf520e102ef1836e3e774edf46c0ebcba047f475931344ffc9b9bcbd0
拼接后在新的master节点上执行即可。k8s-master03
节点也是按照上面的方法。三个master节点就部署好了。
安装网络插件
因为上面我们初始化的时候定义了使用flannel插件,
到此,三个master节点已经安装好了,但是我们没有安装插件,因为在生成kubeadm.yaml文件的时候我们指定的插件是flannel, # wget https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml # 如果有节点是多网卡,则需要在资源清单文件中指定内网网卡 # 搜索到名为 kube-flannel-ds 的 DaemonSet,在kube-flannel容器下面 # vi kube-flannel.yml ...... containers: - name: kube-flannel image: quay.io/coreos/flannel:v0.14.0 command: - /opt/bin/flanneld args: - --ip-masq - --kube-subnet-mgr - --iface=eth0 # 如果是多网卡的话,指定内网网卡的名称 ...... # kubectl apply -f kube-flannel.yml # 安装 flannel 网络插件 也可以不修改,看自己的实际情况。 #calico网络插件 # curl https://docs.projectcalico.org/manifests/calico.yaml -O #k8s-v1.20支持的最新版calico是v3.20 #故正确获取calico的yaml文件应该用: #https://docs.projectcalico.org/archive/v3.20/manifests/calico.yaml # kubectl apply -f calico.yaml
添加node节点
master节点建议跑集群的组件,node节点可以跑应用,所以需要给集群加node节点。也可以根据初始化生成的密钥添加node节点也可以使用命令的方式添加。可以手动创建一下kubeadm token create --print-join-command
,创建成功后在node节点执行即可。
#master节点 # kubeadm token create --print-join-command kubeadm join 172.16.10.54:6443 --token ncoknr.k0i67be6yhw0m27s --discovery-token-ca-cert-hash sha256:c656da07f1a79168d392a4c59807a89293df04a514736a2bea147425a1b59408 #node节点 # kubeadm join 172.16.10.54:6443 --token ncoknr.k0i67be6yhw0m27s --discovery-token-ca-cert-hash sha256:c656da07f1a79168d392a4c59807a89293df04a514736a2bea147425a1b59408 [preflight] Running pre-flight checks [WARNING Hostname]: hostname "k8s-node02" could not be reached [WARNING Hostname]: hostname "k8s-node02": lookup k8s-node02 on 211.167.230.100:53: no such host [preflight] Reading configuration from the cluster... [preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml' [kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml" [kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env" [kubelet-start] Starting the kubelet [kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap... This node has joined the cluster: * Certificate signing request was sent to apiserver and a response was received. * The Kubelet was informed of the new secure connection details. Run 'kubectl get nodes' on the control-plane to see this node join the cluster. # kubectl get node NAME STATUS ROLES AGE VERSION 172.16.10.50 Ready control-plane 3h25m v1.24.1 k8s-master02 Ready control-plane 178m v1.24.1 k8s-master03 Ready control-plane 96m v1.24.1 k8s-node02 NotReady <none> 6m32s v1.24.1
到此,kubernetes高可用集群部署完了,下面进行测试一下集群是否在宕节点的时候是否影响整个集群。
测试负载均衡
测试集群的高可用,目的是防止一个节点停止了,整个集群不可用的情况,下面手动测试,我初始化的时候把172.16.10.50这个节点的名称弄成ip了,心里别扭,我就把这个节点删除,然后重新加入节点,看一下是否对整个集群是否产生影响。在操作之前看一下vip选举那个是leader。
# kubectl logs -f kube-vip-k8s-master02 -n kube-system time="2023-09-07T02:54:23Z" level=info msg="Starting kube-vip.io [v0.6.2]" time="2023-09-07T02:54:23Z" level=info msg="namespace [kube-system], Mode: [ARP], Features(s): Control Plane:[true], Services:[true]" time="2023-09-07T02:54:23Z" level=info msg="prometheus HTTP server started" time="2023-09-07T02:54:23Z" level=info msg="Starting Kube-vip Manager with the ARP engine" time="2023-09-07T02:54:23Z" level=info msg="beginning services leadership, namespace [kube-system], lock name [plndr-svcs-lock], id [k8s-master02]" I0907 02:54:23.710713 1 leaderelection.go:245] attempting to acquire leader lease kube-system/plndr-svcs-lock... time="2023-09-07T02:54:23Z" level=info msg="Beginning cluster membership, namespace [kube-system], lock name [plndr-cp-lock], id [k8s-master02]" I0907 02:54:23.712441 1 leaderelection.go:245] attempting to acquire leader lease kube-system/plndr-cp-lock... E0907 02:54:33.722376 1 leaderelection.go:327] error retrieving resource lock kube-system/plndr-cp-lock: Get "https://kubernetes:6443/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/plndr-cp-lock": net/http: TLS handshake timeout E0907 02:54:33.722386 1 leaderelection.go:327] error retrieving resource lock kube-system/plndr-svcs-lock: Get "https://kubernetes:6443/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/plndr-svcs-lock": net/http: TLS handshake timeout time="2023-09-07T02:54:44Z" level=info msg="Node [k8s-master01] is assuming leadership of the cluster" time="2023-09-07T02:54:44Z" level=info msg="new leader elected: k8s-master01" 目前vip的leader节点也就是,172.16.10.50,下面我先在172.16.10.50节点上初始化,然后删除节点kubeadm reset,然后在另外一个master节点上删除‘kubectl delete nodes 172.16.10.50’,这个时候看一下vip的日志,选举跳到其他节点上了。 # kubectl logs -f kube-vip-k8s-master02 -n kube-system time="2023-09-07T02:54:23Z" level=info msg="Starting kube-vip.io [v0.6.2]" time="2023-09-07T02:54:23Z" level=info msg="namespace [kube-system], Mode: [ARP], Features(s): Control Plane:[true], Services:[true]" time="2023-09-07T02:54:23Z" level=info msg="prometheus HTTP server started" time="2023-09-07T02:54:23Z" level=info msg="Starting Kube-vip Manager with the ARP engine" time="2023-09-07T02:54:23Z" level=info msg="beginning services leadership, namespace [kube-system], lock name [plndr-svcs-lock], id [k8s-master02]" I0907 02:54:23.710713 1 leaderelection.go:245] attempting to acquire leader lease kube-system/plndr-svcs-lock... time="2023-09-07T02:54:23Z" level=info msg="Beginning cluster membership, namespace [kube-system], lock name [plndr-cp-lock], id [k8s-master02]" I0907 02:54:23.712441 1 leaderelection.go:245] attempting to acquire leader lease kube-system/plndr-cp-lock... E0907 02:54:33.722376 1 leaderelection.go:327] error retrieving resource lock kube-system/plndr-cp-lock: Get "https://kubernetes:6443/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/plndr-cp-lock": net/http: TLS handshake timeout E0907 02:54:33.722386 1 leaderelection.go:327] error retrieving resource lock kube-system/plndr-svcs-lock: Get "https://kubernetes:6443/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/plndr-svcs-lock": net/http: TLS handshake timeout time="2023-09-07T02:54:44Z" level=info msg="Node [k8s-master01] is assuming leadership of the cluster" time="2023-09-07T05:55:27Z" level=info msg="new leader elected: k8s-master03" I0907 05:55:29.053911 1 leaderelection.go:255] successfully acquired lease kube-system/plndr-cp-lock time="2023-09-07T05:55:29Z" level=info msg="Gratuitous Arp broadcast will repeat every 3 seconds for [172.16.10.54]" time="2023-09-07T05:55:29Z" level=info msg="Node [k8s-master02] is assuming leadership of the cluster"
然后我在把172.16.10.50这个节点加入到集群,加入集群方法参考上面,先生成一个kube-vip文件,然后创建两个密钥进行拼接,在172.16.10.50节点执行加入集群。
# kubectl get node NAME STATUS ROLES AGE VERSION k8s-master01 Ready control-plane 19m v1.24.1 k8s-master02 Ready control-plane 3h34m v1.24.1 k8s-master03 Ready control-plane 132m v1.24.1 k8s-node02 Ready <none> 42m v1.24.1
节点有了,并且都已经正常了,这时我们可以看一下master节点的网卡,在初始化集群时我们选择的eth0这个网段,在master节点上,虚拟的vip地址已经在eth0上有了,只要虚拟ip存在就可以和master节点通信了。
# ip addr 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000 link/ether 52:54:00:2b:e4:b6 brd ff:ff:ff:ff:ff:ff inet 172.16.10.50/16 brd 172.16.255.255 scope global eth0 valid_lft forever preferred_lft forever inet 172.16.10.54/32 scope global eth0 valid_lft forever preferred_lft forever inet6 fe80::5054:ff:fe2b:e4b6/64 scope link valid_lft forever preferred_lft forever
我们可以尝试创建应用,然后重启节点或者关闭一个master节点是否影响整个集群。我这里测试的是没有任何异常。下面是针对安装过程中出现的异常和解决方法。
异常一
master节点加入集群,提示错误,主要是因为libseccomp
版本太低了,升级一下版本即可
ctr: failed to create shim task: OCI runtime create failed: unable to retrieve OCI runtime error (open /run/containerd/io.containerd.runtime.v2.task/default/vip/log.json: no such file or directory): runc did not terminate successfully: exit status 127: unknown 主要是因为libseccomp的版本太低了,所以要升级一下版本 # rpm -qa | grep libseccomp libseccomp-2.3.1-4.el7.x86_64 # rpm -e libseccomp-2.3.1-4.el7.x86_64 --nodeps # rpm -qa | grep libseccomp # wget https://rpmfind.net/linux/centos/8-stream/BaseOS/x86_64/os/Packages/libseccomp-2.5.1-1.el8.x86_64.rpm # rpm -ivh libseccomp-2.5.1-1.el8.x86_64.rpm warning: libseccomp-2.5.1-1.el8.x86_64.rpm: Header V3 RSA/SHA256 Signature, key ID 8483c65d: NOKEY Preparing... ################################# [100%] Updating / installing... 1:libseccomp-2.5.1-1.el8 ################################# [100%]
异常二
master
初始化一直检查超时,提示kubelet没有启动,检查日志,通过systemctl status kubelet
查看是找不到节点,在日志路径下查看到kube-vip下的日志提示通过计算机名找不到当前节点。可以修改yml文件把主机名都换成ip,也可以在host里做本地解析。
[kubelet-start] Starting the kubelet [control-plane] Using manifest folder "/etc/kubernetes/manifests" [control-plane] Creating static Pod manifest for "kube-apiserver" [control-plane] Creating static Pod manifest for "kube-controller-manager" [control-plane] Creating static Pod manifest for "kube-scheduler" [etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests" [wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s [kubelet-check] Initial timeout of 40s passed. Unfortunately, an error has occurred: timed out waiting for the condition This error is likely caused by: - The kubelet is not running - The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled) If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands: - 'systemctl status kubelet' - 'journalctl -xeu kubelet' Additionally, a control plane component may have crashed or exited when started by the container runtime. To troubleshoot, list all containers using your preferred container runtimes CLI. Here is one example how you may list all running Kubernetes containers by using crictl: - 'crictl --runtime-endpoint unix:///var/run/containerd/containerd.sock ps -a | grep kube | grep -v pause' Once you have found the failing container, you can inspect its logs with: - 'crictl --runtime-endpoint unix:///var/run/containerd/containerd.sock logs CONTAINERID' error execution phase wait-control-plane: couldn't initialize a Kubernetes cluster To see the stack trace of this error execute with --v=5 or higher #这里提示了,kubelet没有运行,也可以看到更细节的日志内容,在初始化命令后面加--v=5参数,如果重新初始化一定要初始化一下kubeadm,使用kubeadm reset命令即可,提示输入‘Y’。在kube-vip的日志中发现请求到k8s-master01的时候请求超时,需要本地做一下hosts解析即可。 2023-09-06T18:00:49.873411929+08:00 stderr F E0906 10:00:49.873310 1 leaderelection.go:325] error retrieving resource lock kube-system/plndr-svcs-lock: Get "https://k8s-master01:6443/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/plndr-svcs-lock": x509: certificate is valid for 172.16.10.50, api.k8s.local, kubernetes, kubernetes.default, kubernetes.default.svc, kubernetes.default.svc.cluster.local, not k8s-master01
异常三
下面我们就需要看一下在重启之后加入到集群遇到的异常,前面在加载,下面异常点。
[preflight] Running pre-flight checks error execution phase preflight: [preflight] Some fatal errors occurred: [ERROR FileContent--proc-sys-net-bridge-bridge-nf-call-iptables]: /proc/sys/net/bridge/bridge-nf-call-iptables does not exist [preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...` To see the stack trace of this error execute with --v=5 or higher 主要是因为重启后加载br_netfilter模块失效了,为了避免重启失效,我们需要持久化写到文件中。 echo "br_netfilter" > /etc/modprobe.d/modules.conf
您可以选择一种方式赞助本站
支付宝扫一扫赞助
微信钱包扫描赞助
赏