TANKENQI.cn

May 28, 2024

K8S集群环境搭建(Containerd作为容器)

K8S36.7 min to read

1 前述

1.1 云原生定义

1.2 容器、虚拟机、Docker、Openstack 和 K8S

1.3 K8S 和 云原生

在单机上运行容器,无法发挥它的最大效能,只有形成集群,才能最大程度发挥容器的良好隔离、资源分配与编排管理的优势,而对于容器的编排管理,Swarm、Mesos 和 Kubernetes 的大战已经基本宣告结束,Kubernetes 成为了无可争议的赢家。

1.4 K8S 介绍

image-20230414170948460

image-20230414171119324

1.5 基本概念

  apiVersion: v1  kind: Pod  metadata:    name: nginx    labels:      app: nginx  spec:    containers:    - name: nginx      image: nginx      ports:

1.6 K8S 常见命令

  apiVersion: v1  kind: Pod  metadata:    name: nginx    labels:      app: nginx  spec:    containers:    - name: nginx      image: nginx      ports:      - containerPort: 80
  apiVersion: v1  kind: Pod  metadata:    name: redis  spec:    containers:    - name: redis      image: redis      volumeMounts:      - name: redis-persistent-storage        mountPath: /data/redis    volumes:    - name: redis-persistent-storage      hostPath:        path: /data/

image-20230415143046325

1.7 K8S 常用运维命令

kubectl drain --delete-local-data --ignore-daemonsets NODENAMEkubectl uncordon NODENAME

2 K8S 集群基础环境部署

若服务器之前搭建过 K8S 集群,需要彻底删除

参考https://blog.csdn.net/qq_43159578/article/details/124131709

sudo systemctl stop kubeletsudo systemctl stop containerd
yum remove kubeadm kubectl kubelet kubernetes-cni -y
# 清除残留文件rm -rf /root/.kuberm -rf /etc/cni/net.drm -rf /etc/kubernetes/*
rm -rf /var/lib/etcd
# sudo ipvsadm -C# sudo iptables -F && sudo iptables -t nat -F && sudo iptables -t mangle -F && sudo # # iptables -Xkubec
kubeadm reset

kubeadm reset 命令用于清除当前节点上所有与 kubeadm initkubeadm join 命令创建的 Kubernetes 集群相关的状态。其主要作用如下:

  1. 移除 Kubernetes 集群相关的系统服务,如 kubelet 和 kube-proxy。

  2. 删除所有在当前节点上创建的 Kubernetes 对象,包括 Pod、Service、Deployment 等。

  3. 移除 etcd 中与当前节点相关的信息,如节点标识、配置信息等。

  4. 删除 Kubernetes 数据目录,包括证书、密钥、kubeconfig 文件等。

使用 kubeadm reset 命令可以清除当前节点上的所有 Kubernetes 相关状态,以便重新创建新的 Kubernetes 集群或者将当前节点加入到另一个 Kubernetes 集群中。在进行 kubeadm reset 操作之前,应该先备份当前节点上的重要数据和配置信息,以便在需要时进行恢复。

2.1 服务器

2.2 安装过程

2.2.1 前提条件

a. 节点之中不可以有重复的主机名、MAC 地址或 product_uuid

cat /sys/class/dmi/id/product_uuid

b. 检查网络适配器:若有多个网卡,确保每个node的子网通过默认路由可达

c. 防火墙开放端口(所有节点)

image-20230415145138479

systemctl restart firewalldfirewall-cmd --zone=public --add-port=443/tcp --permanentfirewall-cmd --zone=public --add-port=6443/tcp --permanentfirewall-cmd --zone=public --add-port=2379-2380/tcp --permanentfirewall-cmd --zone=public --add-port=10250/tcp --permanentfirewall-cmd --zone=public --add-port=10259/tcp --permanentfirewall-cmd --zone=public --add-port=10257/tcp --permanent

d. 关闭防火墙(所有节点)

systemctl stop firewalld NetworkManagersystemctl disable firewalld NetworkManager

e. 关闭交换分区并禁用 SELinux(所有节点)

# 查看 交换分区free -m# 将 <code>SELinux</code> 设置为 <code>permissive</code> 模式(相当于将其禁用)  第一行是临时禁用,第二行是永久禁用setenforce 0sed -i 's/^SELINUX=enforcing$/SELINUX=permissive/' /etc/selinux/config# 关闭swap;第一行是临时禁用,第二行是永久禁用swapoff -a  sed -ri 's/.*swap.*/#&/' /etc/fstab# 允许 iptables 检查桥接流量 (K8s 官方要求)cat <<EOF | sudo tee /etc/modules-load.d/k8s.confbr_netfilterEOFcat <<EOF | sudo tee /etc/sysctl.d/k8s.confnet.bridge.bridge-nf-call-ip6tables = 1net.bridge.bridge-nf-call-iptables = 1EOF# 让配置生效sysctl --system

f. 时间同步(所有节点)

yum install chrony -ysystemctl start chronydsystemctl enable chronydchronyc sources

g. 设置主机名并添加 ip 映射(所有节点)

# 以 gisweb4 为例# 查看主机名cat hostname# 设置主机名hostnamectl set-hostname gisweb4# 更新bash# 添加 ip 映射echo "125.250.153.23  gisweb4" >> /etc/hostsecho "125.250.153.23  gisweb4" >> /etc/hosts# 两台无外网ip的刀片添加内网ip# /etc/hosts 文件内容如下:192.168.0.203 gisweb1192.168.0.202 gisweb2192.168.0.204 gisweb4192.168.0.208 gisweb3192.168.0.176 dellm640-01192.168.0.177 dellm640-03192.168.0.209 dellslot04125.220.153.26 gisweb1125.220.153.25 gisweb2125.220.153.22 gisweb3125.220.153.23 gisweb4125.220.153.28 dellm640-01

2.2.2 升级 Linux 内核到最新(所有节点):

清除缓存,重新构建缓存:

# 清除缓存yum makecache & yum -y update

参考链接https://zhuanlan.zhihu.com/p/368879345

2.2.3 转发 IPv4 并让 iptables 看到桥接流量(所有节点)

# a.  验证br_netfilter是否已经加载lsmod | grep br_netfilter# b.  加载br_netfilter模块:modprobe br_netfilter# c.  iptabels桥接cat <<EOF | sudo tee /etc/modules-load.d/k8s.confoverlaybr_netfilterEOFmodprobe overlaymodprobe br_netfilter# 设置所需的 sysctl 参数,参数在重新启动后保持不变cat <<EOF | sudo tee /etc/sysctl.d/k8s.confnet.bridge.bridge-nf-call-iptables  = 1net.bridge.bridge-nf-call-ip6tables = 1net.ipv4.ip_forward                 = 1EOF# 应用 sysctl 参数而不重新启动sysctl --system

2.2.4 安装 ipvsadm(所有节点)

yum install ipvsadm ipset sysstat conntrack libseccomp -ycat <<EOF | sudo tee /etc/modules-load.d/ipvs.confip_vsip_vs_rrip_vs_wrrip_vs_shnf_conntrackip_tablesip_setxt_setipt_setipt_rpfilteript_REJECTipipEOFsystemctl restart systemd-modules-load.servicelsmod | grep -e ip_vs -e nf_conntrack

2.2.5 修改内核参数(所有节点,lb除外)

cat <<EOF | sudo tee /etc/sysctl.d/k8s.confnet.ipv4.ip_forward = 1net.bridge.bridge-nf-call-iptables = 1fs.may_detach_mounts = 1vm.overcommit_memory=1vm.panic_on_oom=0fs.inotify.max_user_watches=89100fs.file-max=52706963fs.nr_open=52706963net.netfilter.nf_conntrack_max=2310720net.ipv4.tcp_keepalive_time = 600net.ipv4.tcp_keepalive_probes = 3net.ipv4.tcp_keepalive_intvl =15net.ipv4.tcp_max_tw_buckets = 36000net.ipv4.tcp_tw_reuse = 1net.ipv4.tcp_max_orphans = 327680net.ipv4.tcp_orphan_retries = 3net.ipv4.tcp_syncookies = 1net.ipv4.tcp_max_syn_backlog = 16384net.ipv4.ip_conntrack_max = 65536net.ipv4.tcp_max_syn_backlog = 16384net.ipv4.tcp_timestamps = 0net.core.somaxconn = 16384net.ipv6.conf.all.disable_ipv6 = 0net.ipv6.conf.default.disable_ipv6 = 0net.ipv6.conf.lo.disable_ipv6 = 0net.ipv6.conf.all.forwarding = 1EOFsysctl --systemS

2.2.6 安装Container Runtime(选用containerd,弃用docker):

官方安装教程:https://github.com/containerd/containerd/blob/main/docs/getting-started.md

# 安装containerd.ioyum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repoyum install containerd.io# 安装CNI插件# i.下载cni-plugins.tar 从https://github.com/containernetworking/plugins/releases# 在线下载:# wget https://github.com/containernetworking/plugins/releases/download/v1.1.1/cni-plugins-linux-amd64-v1.1.1.tgz# ii.在/opt/cni/bin下解压:# 把22上的拷贝到没有公网ip的服务器上# scp -P22 /opt/cni/bin/cni-plugins-linux-amd64-v1.1.1.tgz root@192.168.0.203:/opt/cni/bin/mkdir -p /opt/cni/bincd /opt/cni/bintar Cxzvf /opt/cni/bin cni-plugins-linux-amd64-v1.1.1.tgz# 重启服务:systemctl restart containerd# 开机启动:systemctl enable containerd# 配置systemd cgroup驱动# 修改配置文件,将 SystemdCgroup 改为 true 注意配置项是区分大小写的containerd config default | sudo tee /etc/containerd/config.tomlvim /etc/containerd/config.toml[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]  ...  [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]    SystemdCgroup = true# 并将 sandbox_image 地址修改为国内的地址并且将 sandbox_image = "registry.k8s.io/pause:3.6"修改为 sandbox_image = "registry.cn-hangzhou.aliyuncs.com/google_containers/pause:3.6"# 启动 containerdsystemctl restart containerd# 开机自启systemctl enable containerd# 启动成功后可以查看到监听的端口netstat -nlput | grep containerdtcp        0      0 127.0.0.1:36669         0.0.0.0:*               LISTEN      8665/containerd      off (0.00/0/0)

2.2.7 在所有电脑上安装 kubeadm, kubelet and kubectl(所有节点)

# a.  kubeadm: the command to bootstrap the cluster.# b.  kubelet: the component that runs on all of the machines in your cluster and does things like starting pods and containers.# c.  kubectl: the command line util to talk to your cluster.
  1. 配置阿里云的k8s源,加速安装
cat <<EOF | sudo tee /etc/yum.repos.d/kubernetes.repo[kubernetes]name=Kubernetesbaseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64/enabled=1gpgcheck=1repo_gpgcheck=1gpgkey=https://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg https://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpgEOF
  1. 将 SELinux 设置为 permissive 模式(相当于将其禁用)
setenforce 0sed -i 's/^SELINUX=enforcing$/SELINUX=permissive/' /etc/selinux/config
  1. 安装
yum install -y --nogpgcheck kubelet-1.25.2 kubeadm-1.25.2 kubectl-1.25.2  # sudo yum install -y --nogpgcheck kubelet-1.26.3 kubeadm-1.26.3 kubectl-1.26.3 # 自启动systemctl enable --now kubelet

2.2.8 启动控制面节点

kubeadm init --kubernetes-version=v1.25.2 --image-repository registry.aliyuncs.com/google_containers --pod-network-cidr=10.244.0.0/16 #kubeadm init --kubernetes-version=v1.26.3 --image-repository registry.aliyuncs.com/google_containers --pod-network-cidr=10.244.0.0/16 
若出现如下错误(上一次集群初始化的残留文件)
# 如果出现报错  [ERROR FileAvailable--etc-kubernetes-manifests-kube-apiserver.yaml]: /etc/kubernetes/manifests/kube-apiserver.yaml already exists
执行如下命令后再次初始化
rm -rf /var/lib/etcdrm -rf /etc/kubernetes/manifests/* 

2.2.9 配置环境变量(初始化后)

mkdir -p $HOME/.kubecp -i /etc/kubernetes/admin.conf $HOME/.kube/configchown $(id -u):$(id -g) $HOME/.kube/config

2.2.10 将 master 作为node(管理节点上执行)

kubectl describe nodes gisweb4 |grep Taints
# 本次删除的污点为:node-role.kubernetes.io/control-plane-kubectl taint nodes --all node-role.kubernetes.io/control-plane-

2.2.11 安装 Pod 网络插件(CNI:Container Network Interface)(master)

你必须部署一个基于 Pod 网络插件的 容器网络接口 (CNI),以便你的 Pod 可以相互通信。

确保kubeadm初始化时,pod_cidr 为10.244.0.0
curl https://github.com/flannel-io/flannel/releases/latest/download/kube-flannel.yml -O
vim kube-flannel.yml# 添加- --iface-regex=^192.168..

kubectl apply -f kube-flannel.yml

2.2.12 node 节点加入集群

kubeadm token create --print-join-command# 返回如下kubeadm join 125.220.153.23:6443 --token x0wdaj.d5wltdzdtos22fl6 --discovery-token-ca-cert-hash sha256:9245d363cdeb1757bacba21b9ccdc06a28e7490bcedfb0eeb404b56f769fa112

如果此步报如下错误

The connection to the server localhost:8080 was refused - did you specify the right host
  1. 出现这个问题的原因是kubectl命令需要使用kubernetes-admin的身份来运行,在kubeadm int启动集群的步骤中就生成了/etc/kubernetes/admin.conf
  2. 因此,解决方法如下,将主节点中的/etc/kubernetes/admin.conf文件拷贝到工作节点相同目录下:
  3. 然后分别在工作节点上配置环境变量:

解决方案

# 将主节点中的【/etc/kubernetes/admin.conf】文件拷贝到工作节点相同目录下:scp -P22 /etc/kubernetes/admin.conf oge@125.220.153.22:/etc/kubernetes/
echo "export KUBECONFIG=/etc/kubernetes/admin.conf" >> ~/.bash_profilesource ~/.bash_profile
kubeadm reset

2.2.13 添加新的 master 节点

# 管理节点查看kubeadm token create --print-join-command# 生成如下kubeadm join 125.220.153.23:6443 --token mc56rw.t9b3d1ql53yhom9y --discovery-token-ca-cert-hash sha256:753ccf865a9c590413043d469a9848300871afaef7221e3fdb97d981939a2b83# 管理节点kubeadm init phase upload-certs --upload-certs # 输出I0413 11:00:30.817038   10009 version.go:256] remote version is much newer: v1.27.0; falling back to: stable-1.25[upload-certs] Storing the certificates in Secret "kubeadm-certs" in the "kube-system" Namespace[upload-certs] Using certificate key:70d43cde7f6423b5c3e88c0fa9d08511cefbc53992dc63a13544cd548a912941# 管理节点,在networking前添加:controlPlaneEndpoint: 125.220.153.23:6443kubectl -n kube-system edit cm kubeadm-config
# 新节点# 重新加入的话,检查是否需要kill掉6443端口(这是之前的api-service服务)kubeadm join 125.220.153.23:6443 --token mc56rw.t9b3d1ql53yhom9y --discovery-token-ca-cert-hash sha256:753ccf865a9c590413043d469a9848300871afaef7221e3fdb97d981939a2b83 --control-plane --certificate-key 70d43cde7f6423b5c3e88c0fa9d08511cefbc53992dc63a13544cd548a912941# 生成This node has joined the cluster and a new control plane instance was created:* Certificate signing request was sent to apiserver and approval was received.* The Kubelet was informed of the new secure connection details.* Control plane label and taint were applied to the new node.* The Kubernetes control plane instances scaled up.* A new etcd member was added to the local/stacked etcd cluster.To start administering your cluster from this node, you need to run the following as a regular user:        mkdir -p $HOME/.kube        sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config        sudo chown $(id -u):$(id -g) $HOME/.kube/configRun 'kubectl get nodes' to see this node join the cluster.

3 K8S 管理平台 dashboard 环境部署(管理节点)

kubectl apply -f https://raw.githubusercontent.com/kubernetes/dashboard/v2.6.1/aio/deploy/recommended.yaml
kubectl edit svc kubernetes-dashboard -n kubernetes-dashboard# 将其中的,type: ClusterIP 修改成type: NodePort,保存退出即可。# 查看服务的暴露端口,需在安全组放行kubectl get svc -A |grep kubernetes-dashboard

image-20230415154841715

# 创建访问用户kubectl apply -f https://kuboard.cn/install-script/k8s-dashboard/auth.yaml# 获取访问令牌kubectl -n kubernetes-dashboard create token admin-user # 生成的令牌eyJhbGciOiJSUzI1NiIsImtpZCI6IkdVQTZzb3JEM1FHdkpxVDNsSEwtVEZWc2hyR08tbmFFWnFGX2Q2OGt5cEkifQ.eyJhdWQiOlsiaHR0cHM6Ly9rdWJlcm5ldGVzLmRlZmF1bHQuc3ZjLmNsdXN0ZXIubG9jYWwiXSwiZXhwIjoxNjgzNzM1MTQ1LCJpYXQiOjE2ODM3MzE1NDUsImlzcyI6Imh0dHBzOi8va3ViZXJuZXRlcy5kZWZhdWx0LnN2Yy5jbHVzdGVyLmxvY2FsIiwia3ViZXJuZXRlcy5pbyI6eyJuYW1lc3BhY2UiOiJrdWJlcm5ldGVzLWRhc2hib2FyZCIsInNlcnZpY2VhY2NvdW50Ijp7Im5hbWUiOiJhZG1pbi11c2VyIiwidWlkIjoiMzBlMWQzNDEtNDc0Yi00M2MyLWIyNzYtZGIxZTU4NzM5ZTgxIn19LCJuYmYiOjE2ODM3MzE1NDUsInN1YiI6InN5c3RlbTpzZXJ2aWNlYWNjb3VudDprdWJlcm5ldGVzLWRhc2hib2FyZDphZG1pbi11c2VyIn0.mg1IU29tBpH23nglJVbRmVa2A26WZjXxMCzckQyb-dnviLBRzBuvNebb8H4YH5CNJUPkB2GGC_r7dlm5zEbPpK8RqkbwXy-wqkOoMephs83gIQkJ3RgskpWqTgqqg87e6WXHRtuzYsQQZ4Rq3Y4uQy9jJS9o1lIoRTujpmpWORb9vu4JN0RqEfK2chQqNsYCe_TCtvtvkP2EyuU3QSeYdvWh5NNZ9CYwA8l8eqA6ijrmTqZjnI6Q9Ymo7trKSuGFmffotBpN9dTYZoyv6Io_VgEz6_1oHsA0pwG-3wc41Ly11sDAzwjZvoGN1yfw0vsVcwnAjH4LkRG2ImwYIcZbig

4 安装K8S的包管理工具Helm (管理节点)

参考https://helm.sh/docs/intro/install/

参考https://www.cnblogs.com/zhanglianghhh/p/14165995.html github地址https://github.com/helm/helm

image-20230413112634363

cd  ~/k8s/helmwget https://get.helm.sh/helm-v3.11.3-linux-amd64.tar.gztar zxfv helm-v3.11.3-linux-amd64.tar.gzmv ./linux-amd64/helm /usr/bin/# 显示版本,安装完成helm version

5 安装K8S的包管理工具 krew(管理节点)

参考https://krew.sigs.k8s.io/docs/user-guide/setup/install/

git version# 若未安装yum -y install git
(  set -x; cd "$(mktemp -d)" &&  OS="$(uname | tr '[:upper:]' '[:lower:]')" &&  ARCH="$(uname -m | sed -e 's/x86_64/amd64/' -e 's/\(arm\)\(64\)\?.*/\1\2/' -e 's/aarch64$/arm64/')" &&  KREW="krew-${OS}_${ARCH}" &&  curl -fsSLO "https://github.com/kubernetes-sigs/krew/releases/latest/download/${KREW}.tar.gz" &&  tar zxvf "${KREW}.tar.gz" &&  ./"${KREW}" install krew)
# 永久写的用户的环境变量文件,避免登出后失效export PATH="${KREW_ROOT:-$HOME/.krew}/bin:$PATH"source ~/.bashrc
wget https://github.com/kubernetes-sigs/krew/releases/latest/download/krew-linux_amd64.tar.gztar -zxvf krew-linux_amd64.tar.gz# 添加$HOME/.krew/bin目录到PATH环境变量export PATH="${KREW_ROOT:-$HOME/.krew}/bin:$PATH"# 配置为 kubectl 插件mv ./krew-linux_amd64 ./kubectl-krewmv ./kubectl-krew /usr/local/bin/# 测试kubectl krew version
kubectl krewkubectl plugin list

6 为 K8S 创建 PV 持久卷

6.1 PV和PVC

6.2 用 storageClass 动态创建 PV

安装NFS: yum -y install nfs-utils rpcbind
# 执行权限chown -R nobody:nfsnobody /mnt/storage/k8s/pv#chmod -R 777 /mnt/storage/k8s/pv
vim /etc/exports# 添加:/mnt/storage/k8s/pv 192.168.0.0/24(rw,sync,no_root_squash)# 以上设置让所有的 IP 都有效systemctl start rpcbindsystemctl enable rpcbindsystemctl enable nfssystemctl start nfssystemctl start nfs-serversystemctl enable nfs-serversystemctl start firewalldfirewall-cmd --permanent --add-service=nfsfirewall-cmd  --reloadsystemctl stop firewalld && sudo systemctl disable firewalld
exportfs -rvshowmount -e 127.0.0.1
yum install -y nfs-utils# 每个节点挂载nfs客户端的存储目录,本次nfs客户端在gisweb4(192.168.0.204)上mount -t nfs 192.168.0.204:/mnt/storage/k8s/pv /mnt/storage/k8s/pv # 检查挂载情况df -h

参考:https://github.com/kubernetes-sigs/nfs-subdir-external-provisioner

# 更新helm repohelm repo update# 搜索helm库中nfs版本helm search repo nfs-subdir-external-provisioner# 添加 helm 仓库helm repo add nfs-subdir-external-provisioner https://kubernetes-sigs.github.io/nfs-subdir-external-provisioner/helm install nfs-subdir-external-provisioner nfs-subdir-external-provisioner/nfs-subdir-external-provisioner \--set nfs.server=192.168.0.204 \--set nfs.path=/mnt/storage/k8s/pv   \--set image.repository=registry.cn-hangzhou.aliyuncs.com/xzjs/nfs-subdir-external-provisioner \--set image.tag=v4.0.0

参考:http://www.mydlq.club/article/109/#%E5%88%9B%E5%BB%BA-nfs-subdir-external-provisioner-%E9%83%A8%E7%BD%B2%E6%96%87%E4%BB%B6

7 安装 kubeAPPS 可视化软件管理工具

参考:https://kubeapps.dev/docs/latest/tutorials/getting-started/

# 添加 kubeapps 仓库helm repo add bitnami https://charts.bitnami.com/bitnami# 创建 kubeapps 的命名空间kubectl create namespace kubeapps# 安装helm install kubeapps --namespace kubeapps bitnami/kubeapps
# 创建用于访问 Kubeapps 和 Kubernetes 的演示凭证kubectl create --namespace default serviceaccount kubeapps-operatorkubectl create clusterrolebinding kubeapps-operator --clusterrole=cluster-admin --serviceaccount=default:kubeapps-operatorcat <<EOF | kubectl apply -f -apiVersion: v1kind: Secretmetadata:  name: kubeapps-operator-token  namespace: default  annotations:    kubernetes.io/service-account.name: kubeapps-operatortype: kubernetes.io/service-account-tokenEOF
kubectl get --namespace default secret kubeapps-operator-token -o go-template='{{.data.token | base64decode}}'eyJhbGciOiJSUzI1NiIsImtpZCI6IkdVQTZzb3JEM1FHdkpxVDNsSEwtVEZWc2hyR08tbmFFWnFGX2Q2OGt5cEkifQ.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJkZWZhdWx0Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZWNyZXQubmFtZSI6Imt1YmVhcHBzLW9wZXJhdG9yLXRva2VuIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQubmFtZSI6Imt1YmVhcHBzLW9wZXJhdG9yIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQudWlkIjoiNTNjY2M0N2YtZWFmMS00NDY4LWJkN2ItYTVhMzliMzJjMzExIiwic3ViIjoic3lzdGVtOnNlcnZpY2VhY2NvdW50OmRlZmF1bHQ6a3ViZWFwcHMtb3BlcmF0b3IifQ.qsTBQODZLD1EUP5WjF_ju0-_ZFoJa2pEGCGf2zoLK71TjZeytD0GUGp4Z5ACNFuJMtedtx8tRgWhioU2oimxGdCIL4f7Szt0dOQgXD15HmoiUjYEcDQNsfTdcmfZw-m3-zwtTqa3kTTG3Wio0wf_f_ayw8qZCDL2i3PK-7h0QeAb1rQhtCz_e8huNrcshjixGlyw8aKUvdi2hPe6yvpxKJqQeOalNhT22b-ax28oIyqmC-NXYUMyRbEsgOjyuJAv6XdjqsQKbOGMKsTtNyf7CvnHl88hfRZpF0W-GuKj1ggKGYClTHuXnsv9QP-AQN1UaEtcAbUp08bHN9isedJL6w
# 因为是测试环境,因此直接采用 NodePort 方式暴露服务端口kubectl edit svc kubeapps -n kubeapps
kubectl get svc -A |grep kubeapps

8 在 K8S 上部署虚拟机服务 Kubevirt

vim /etc/kubernetes/manifests/kube-apiserver.yaml# 设置 --allow-privileged=true
virt-host-validate qemu# 如果显示没有这个命令,先安装 libvrt 和 qemu 软件包:yum install -y qemu-kvm libvirt virt-install bridge-utils

image-20230408181653957

# 1vim /etc/default/grub# 2.添加GRUB_CMDLINE_LINUX="crashkernel=auto rd.lvm.lv=centos/root rd.lvm.lv=centos/swap rhgb quiet intel_iommu=on"# 3 grub2-mkconfig -o /boot/grub2/grub.cfg# 4 reboot
# K8S 1.25版本,Kubervirt必须0.57.2以上,才能适配kubectl apply -f https://github.com/kubevirt/kubevirt/releases/download/v0.58.0-rc.0/kubevirt-operator.yamlkubectl apply -f https://github.com/kubevirt/kubevirt/releases/download/v0.58.0-rc.0/kubevirt-cr.yaml
kubectl -n kubevirt wait kv kubevirt --for condition=Available

参考https://kubevirt.io/labs/kubernetes/lab2.html

yum install -y tigervnc
kubectl get pods -n kubevirt

image-20230408185646502

kubectl krew install virtkubectl virt help
# 1.  创建yaml文件Xxx# 2.  运行一个虚拟机kubectl apply -f test.yaml# 3.  查看虚拟机kubectl get vmis# 4.  停止/删除虚拟机kubectl delete -f vmi.yaml# 或者kubectl delete vmis testvmi# 5.  开始/停止/暂停虚拟机virtctl start/stop/pause my-vm

9 在 K8S 上部署 PostgreSQL

helm repo add bitnami https://charts.bitnami.com/bitnami# 首先检查是否有oge这个命名空间,否则执行如下进行创建kubectl create ns oge# postgresql 这个名字可以自己定义,但后面每一步都要注意对应更改helm install -n oge  bitnami/postgresql \--set global.storageClass=nfs-client \--set readReplicas.persistence.storageClass=nfs-client \--set primary.persistence.storageClass=nfs-client \--set primary.persistence.size=200Gi \--set readReplicas.persistence.size=200Gi \--set image.tag=14.5.0-debian-11-r6helm install -n geoctap  bitnami/postgresql \--set global.storageClass=nfs-client \--set readReplicas.persistence.storageClass=nfs-client \--set primary.persistence.storageClass=nfs-client \--set primary.persistence.size=200Gi \--set readReplicas.persistence.size=200Gi \--set image.tag=14.5.0-debian-11-r6# 指定版本,可在kubeapps里面查看# --set image.tag=14.5.0-debian-11-r6
kubectl get secret --namespace oge postgresql -o jsonpath="{.data.postgres-password}" | base64 -d# 密码7jXf2gsmUX
kubectl edit svc --namespace oge postgresql# 将 type=ClusterIP 改为 NodePort# b8:85:84:71:64:28echo "SUBSYSTEM==\"net\", ACTION==\"add\", DRIVERS==\"?*\", ATTR{address}==\" b8:85:84:71:64:28\", ATTR{type}==\"1\", KERNEL==\"eno*\", NAME=\"eno1\"" >> /etc/udev/rules.d/70-persistent-net.rules
kubectl get deployment # 发现并没有postgresqlkubectl get all -n oge# 发现有statefulset.apps/postgresql# 设置副本集个数为1kubectl scale --replicas=1 statefulset.apps/postgresql -n oge
# 进入pgsql的podkubectl exec -it -n oge postgresql-0 bash # 用户登录psql -U postgres # 输入密码7jXf2gsmUX
psql -h 125.220.153.23 -p 30865 -U postgres -W -f ./public.sql

10 在 K8S 上部署 MySQL

helm repo add bitnami https://charts.bitnami.com/bitnami# 安装helm install -n oge mysql bitnami/mysql \--set global.storageClass=nfs-client \--set readReplicas.persistence.storageClass=nfs-client \--set primary.persistence.storageClass=nfs-client \--set primary.persistence.size=200Gi \--set readReplicas.persistence.size=200Gi
kubectl get secret --namespace oge mysql -o jsonpath="{.data.mysql-root-password}" | base64 -d# 密码VubCMiHvT1
kubectl edit svc --namespace oge mysql# 将type=ClusterIP改为NodePort# b8:85:84:71:64:28echo "SUBSYSTEM==\"net\", ACTION==\"add\", DRIVERS==\"?*\", ATTR{address}==\" b8:85:84:71:64:28\", ATTR{type}==\"1\", KERNEL==\"eno*\", NAME=\"eno1\"" >> /etc/udev/rules.d/70-persistent-net.rules
  1. 缩放副本集
kubectl get deployment # 发现并没有mysqlkubectl get all -n oge# 发现有statefulset.apps/mysqlkubectl scale --replicas=1 statefulset.apps/mysql -n oge
  1. 在K8S中进入数据库
kubectl exec -it -n oge mysql-1  bash# 进入后登录用户mysql -u root -p# 输入密码

11 在K8S上部署 MongoDB

helm repo add bitnami https://charts.bitnami.com/bitnami# 安装helm install -n ydy mongodb bitnami/mongodb \--set global.storageClass=nfs-client \--set readReplicas.persistence.storageClass=nfs-client \--set primary.persistence.storageClass=nfs-client \--set primary.persistence.size=100Gi \--set readReplicas.persistence.size=100Gi
kubectl get secret --namespace ydy mongodb -o jsonpath="{.data.mongodb-root-password}" | base64 -d# 密码WUL9FPQ2V9
kubectl edit svc --namespace ydy mongodb# 将type=ClusterIP改为NodePort# b8:85:84:71:64:28echo "SUBSYSTEM==\"net\", ACTION==\"add\", DRIVERS==\"?*\", ATTR{address}==\" b8:85:84:71:64:28\", ATTR{type}==\"1\", KERNEL==\"eno*\", NAME=\"eno1\"" >> /etc/udev/rules.d/70-persistent-net.rules
  1. 缩放副本集
kubectl get deployment # 发现并没有mongodbkubectl get all -n ydy# 发现有statefulset.apps/mongodbkubectl scale --replicas=1 statefulset.apps/mongodb -n ydy
  1. 在K8S中进入数据库
kubectl exec -it -n ydy mongodb-644c657c4f-x62cn bash

12 在 K8S 上部署 Apache Spark

两个方式,第一种方式为Spark官方提出的;第二种为Google提出的,更符合K8S原生概念

  1. Spark On K8S
  2. spark-on-k8s-operator

image-20230408170401365

Spark On K8S

image-20230408170444023

spark-on-k8s-operator

12.1 安装 spark-on-k8s-operator

参考 https://blog.csdn.net/w8998036/article/details/122217230

helm repo add spark-operator https://googlecloudplatform.github.io/spark-on-k8s-operator # 注意是否存在 spark-operator 命名空间,没有则创建kubectl create ns spark-operator# 安装helm install spark-operator spark-operator/spark-operator --namespace spark-operator  --set sparkJobNamespace=default  --set webhook.enable=true
vim spark-application-rbac.yaml# 内容如下apiVersion: v1kind: ServiceAccountmetadata:  name: spark  namespace: spark---apiVersion: rbac.authorization.k8s.io/v1kind: Rolemetadata:  namespace: spark  name: spark-rolerules:- apiGroups: [""]  resources: ["pods"]  verbs: ["*"]- apiGroups: [""]  resources: ["services"]  verbs: ["*"]---apiVersion: rbac.authorization.k8s.io/v1kind: RoleBindingmetadata:  name: spark-role-binding  namespace: sparksubjects:- kind: ServiceAccount  name: spark  namespace: sparkroleRef:  kind: Role  name: spark-role  apiGroup: rbac.authorization.k8s.iokubectl create clusterrolebinding root-cluster-admin-binding --clusterrole=cluster-admin --user=root

创建一个Spark作业的yaml配置文件,并进行部署。

  1. 创建spark-pi.yaml文件
apiVersion: "sparkoperator.k8s.io/v1beta2"kind: SparkApplicationmetadata:  name: spark-pi  namespace: sparkspec:  type: Scala  mode: cluster  image: "registry.cn-hangzhou.aliyuncs.com/yudayu/spark:v3.1.1"    # 1gcr.io/spark-operator/spark:v3.1.1需要更换镜像,gcr.io目前国内无法访问。可以先对docker挂代理,pull到阿里云镜像后  imagePullPolicy: IfNotPresent  mainClass: org.apache.spark.examples.SparkPi  mainApplicationFile: "local:///opt/spark/examples/jars/spark-examples_2.12-3.1.1.jar"      # 需要更换自己的jar包,local指该jar位于image内,可换成所有节点都能访问的web路径,或者通过指定nas挂载pv,将jar包放在nas的pv里  sparkVersion: "3.1.1"  restartPolicy:    type: Never  volumes:    - name: "test-volume"      hostPath:        path: "/tmp"        type: Directory  driver:    cores: 1    coreLimit: "1200m"    memory: "512m"    labels:      version: 3.1.1    serviceAccount: spark    volumeMounts:      - name: "test-volume"        mountPath: "/tmp"  executor:    cores: 1    instances: 2    memory: "512m"    labels:      version: 3.1.1    volumeMounts:      - name: "test-volume"        mountPath: "/tmp"
  1. 部署一个Spark计算任务
kubectl apply -f spark-pi.yaml

运维

kubectl get sparkapplicationskubectl describe sparkapplicationskubectl get svc  # 查看该任务的spark ui

12.2 安装 Spark On K8S

helm repo add bitnami https://charts.bitnami.com/bitnami# 注意是否存在 spark-operator 命名空间,没有则创建kubectl create ns spark-on-k8shelm install -n spark-on-k8s spark bitnami/spark \  --set worker.coreLimit=28 ./bin/spark-submit    \    --class org.apache.spark.examples.SparkPi \    --conf spark.kubernetes.container.image=bitnami/spark:3 \    --master k8s://https://125.220.153.23:6443 \    --conf spark.kubernetes.driverEnv.SPARK_MASTER_URL=spark://10.97.43.141:7077  \--deploy-mode cluster \  --executor-memory 20G \  --num-executors 10 \--conf spark.executor.instances=5 \https:///data/spark-examples_2.12-3.3.0.jar 1000kubectl run --namespace spark-on-k8s spark-oge --rm --tty -i --restart='Never' \--image bitnami/spark:3 \-- spark-submit --master spark://10.97.43.141:7077 \--class org.apache.spark.examples.SparkPi \    --deploy-mode cluster \/data/spark-examples_2.12-3.3.0.jar 100000

13 在K8S上部署redis集群

14 在K8S上部署nginx

14.1 创建pv

vim nginx-pv.yamlapiVersion: v1kind: PersistentVolumemetadata:  name: nginx-ydy-pv  namespace: ydyspec:  capacity:    storage: 10Gi  accessModes:    - ReadWriteOnce  persistentVolumeReclaimPolicy: Retain  storageClassName: manual  hostPath:    path: /mnt/storage/k8s/pv/ydy-nginx-pvc

14.2 创建pvc

vim nginx-pvc.yamlapiVersion: v1kind: PersistentVolumeClaimmetadata:  name: nginx-ydy-pvc  namespace: ydyspec:  accessModes:    - ReadWriteOnce  resources:    requests:      storage: 10Gi  storageClassName: manual

14.3 安装nginx并设置静态资源挂载的pvc

将nginx中的/app挂载到/mnt/storage/k8s/pv/luluancheng-nginx-pvc

helm install -n ydy nginx bitnami/nginx \--set staticSitePVC=nginx-ydy-pvc

附录:疑难问题解决:

1 K8S强制删除 namespace(会删除该命名空间中的所有 pod )

#1、将该分区导出为json文件,以 oge namespace为例kubectl get ns oge -o json > oge.json#2、编辑该json文件,将spec内的内容全部删除,然后保存退出

22f0a5cbf0424425b0d62681f79cb713

ef949b8bb503410c918894ab20bad993

# 3、另开一个终端,启动一个proxykubectl proxy --port=8081# 4、执行一个curl命令,更新oge namespacecurl -k -H "Content-Type: application/json" -X PUT --data-binary @oge.json http://127.0.0.1:8081/api/v1/namespaces/oge/finalize

2 CNI网络错误

8d5d49703c8ac59f24fde81b3982b616

79e65e4f797200ad98feac6f8b2d4254

310efbdb614636a17aa48eaf4a8dc2c5

# 下面我们删除错误的cni0,然后让它自己重建ifconfig cni0 downip link delete cni0

3 28 服务器增加路由(为了让两台刀片上网)

iptables -t nat -A POSTROUTING -s 192.168.0.209/24 -o em1_2 -j MASQUERADEiptables -t nat -A POSTROUTING -s 192.168.0.177/24 -o em1_2 -j MASQUERADE

4 异常断电等导致 etcd 心跳检测出现问题

5 OpenStack服务器网络跳转镜像

作用: 保证OpenStack上服务器与实验室服务器可以 ping 通