赞
踩
官网介绍:
KubeSphere 服务网格基于Istio,将微服务治理和流量管理可视化。它拥有强大的工具包,包括熔断机制、蓝绿部署、金丝雀发布、流量镜像、链路追踪、可观测性和流量控制等。KubeSphere 服务网格支持代码无侵入的微服务治理,帮助开发者快速上手,Istio 的学习曲线也极大降低。KubeSphere 服务网格的所有功能都旨在满足用户的业务需求。
当前3.3.0版本的 KubeSphere 暂不支持为多集群
应用创建灰度发布策略。单集群或者最简单的All-in-one
虚拟机还是可以用的。有多集群灰度发布需求暂时不能考虑KubeSphere,需要自行搭建K8S集群及Istio等组件,自己想办法解决。
中文官网:https://istio.io/latest/zh/
Istio是个很常见的服务网格组件,主要是负载均衡
、流量管控
等功能。
KubeSphere官网概述:https://kubesphere.com.cn/docs/v3.3/project-user-guide/grayscale-release/overview/
所谓的灰度发布,其实就是通过不同的发布策略,将老旧的微服务替换为新版本微服务,且升级过程中遇到问题时风险更小,尽量减少对prod环境的影响。
KubeSphere官网介绍:https://kubesphere.com.cn/docs/v3.3/project-user-guide/grayscale-release/blue-green-deployment/
蓝绿部署会创建一个相同的备用环境,在该环境中运行新的应用版本,从而为发布新版本提供一个高效的方式,不会出现宕机或者服务中断。通过这种方法,KubeSphere 将所有流量路由至其中一个版本,即在任意给定时间只有一个环境接收流量。如果新构建版本出现任何问题,可以立刻回滚至先前版本。
这种发布策略很容易理解,就是创建备份,如果新版本不稳定或者功能、性能不达标,就立即回撤
到老版本。Istio切换流量转发比一定是较人工重新部署上线更加迅速。
KubeSphere官网介绍:https://kubesphere.com.cn/docs/v3.3/project-user-guide/grayscale-release/canary-release/
金丝雀部署缓慢地向一小部分用户推送变更,从而将版本升级的风险降到最低。具体来讲,可以在高度响应的仪表板上进行定义,选择将新的应用版本暴露给一部分生产流量。另外,执行金丝雀部署后,KubeSphere 会监控请求,提供实时流量的可视化视图。在整个过程中,可以分析新的应用版本的行为,选择逐渐增加向它发送的流量比例。待对构建版本有把握后,便可以把所有流量路由至该构建版本。
这种发布策略就类似与各种网游的内测、封测、公测、正式运营。让各种精英用户率先体验,再逐步扩大测试范围
,直到稳定运行。
KubeSphere官网介绍:https://kubesphere.com.cn/docs/v3.3/project-user-guide/grayscale-release/traffic-mirroring/
流量镜像复制实时生产流量并发送至镜像服务。默认情况下,KubeSphere 会镜像所有流量,也可以指定一个值来手动定义镜像流量的百分比。常见用例包括:
这种发布策略就是将同一份prod环境的流量请求发送到镜像服务,类似MQ中消息的1对多分发
。这种方式只会占用网络带宽、CPU时间片、内存、硬盘等资源,但是只要资源充足没有遇到性能瓶颈,就不会影响到prod环境。相同的流量请求转发到镜像服务后,便可以利用prod的数据做功能测试
、性能压测
。
KubeKey中文Github文档:https://github.com/kubesphere/kubekey/blob/master/README_zh-CN.md
KubeSphere官方文档:https://kubesphere.com.cn/docs/v3.3/installing-on-linux/introduction/multioverview/
在安装KubeSphere之前就可以修改配置,这样安装好KubeSphere后,正常情况会自动启动Istio。这是通过KubeKey
实现的。
root@zhiyong-ksp1:/home/zhiyong/kubesphereinstall# ./kk create config Generate KubeKey config file successfully root@zhiyong-ksp1:/home/zhiyong/kubesphereinstall# ll 总用量 70344 drwxrwxr-x 3 zhiyong zhiyong 4096 8月 16 23:14 ./ drwxr-xr-x 16 zhiyong zhiyong 4096 8月 16 23:12 ../ -rw-r--r-- 1 root root 1065 8月 16 23:14 config-sample.yaml -rwxr-xr-x 1 zhiyong zhiyong 54910976 7月 26 14:17 kk* drwxr-xr-x 12 root root 4096 8月 8 10:04 kubekey/ -rw-rw-r-- 1 zhiyong zhiyong 17102249 8月 8 01:03 kubekey-v2.2.2-linux-amd64.tar.gz root@zhiyong-ksp1:/home/zhiyong/kubesphereinstall# cat config-sample.yaml apiVersion: kubekey.kubesphere.io/v1alpha2 kind: Cluster metadata: name: sample spec: hosts: - {name: node1, address: 172.16.0.2, internalAddress: 172.16.0.2, user: ubuntu, password: "Qcloud@123"} - {name: node2, address: 172.16.0.3, internalAddress: 172.16.0.3, user: ubuntu, password: "Qcloud@123"} roleGroups: etcd: - node1 control-plane: - node1 worker: - node1 - node2 controlPlaneEndpoint: ## Internal loadbalancer for apiservers # internalLoadbalancer: haproxy domain: lb.kubesphere.local address: "" port: 6443 kubernetes: version: v1.23.8 clusterName: cluster.local autoRenewCerts: true containerManager: docker etcd: type: kubekey network: plugin: calico kubePodsCIDR: 10.233.64.0/18 kubeServiceCIDR: 10.233.0.0/18 ## multus support. https://github.com/k8snetworkplumbingwg/multus-cni multusCNI: enabled: false registry: privateRegistry: "" namespaceOverride: "" registryMirrors: [] insecureRegistries: [] addons: []
笔者创建的是默认的yaml配置文件。
还需要有如下内容:
servicemesh:
enabled: true # 将“false”更改为“true”。
istio: # Customizing the istio installation configuration, refer to https://istio.io/latest/docs/setup/additional-setup/customize-installation/
components:
ingressGateways:
- name: istio-ingressgateway # 将服务暴露至服务网格之外。默认不开启。
enabled: false
cni:
enabled: false # 启用后,会在 Kubernetes pod 生命周期的网络设置阶段完成 Istio 网格的 pod 流量转发设置工作。
之后执行:
./kk create cluster -f config-sample.yaml
KubeKey
会自动在Linux服务器创建一个包含Istio组件的K8S集群。
当然也可以手动安装Istio这种CNI组件。
Istio中文官网安装文档:https://istio.io/latest/zh/docs/setup/additional-setup/cni/
由于KubeSphere既可以运行在Linux服务器,又可以直接运行在K8S的pod中,故已经有K8S集群时,也可以在安装KubeSphere时启动服务网格组件Istio。方法大同小异。
vim cluster-configuration.yaml --- apiVersion: installer.kubesphere.io/v1alpha1 kind: ClusterConfiguration metadata: name: ks-installer namespace: kubesphere-system labels: version: v3.3.0 spec: persistence: storageClass: "" # If there is no default StorageClass in your cluster, you need to specify an existing StorageClass here. authentication: jwtSecret: "" # Keep the jwtSecret consistent with the Host Cluster. Retrieve the jwtSecret by executing "kubectl -n kubesphere-system get cm kubesphere-config -o yaml | grep -v "apiVersion" | grep jwtSecret" on the Host Cluster. local_registry: "" # Add your private registry address if it is needed. # dev_tag: "" # Add your kubesphere image tag you want to install, by default it's same as ks-installer release version. etcd: monitoring: false # Enable or disable etcd monitoring dashboard installation. You have to create a Secret for etcd before you enable it. endpointIps: localhost # etcd cluster EndpointIps. It can be a bunch of IPs here. port: 2379 # etcd port. tlsEnable: true common: core: console: enableMultiLogin: true # Enable or disable simultaneous logins. It allows different users to log in with the same account at the same time. port: 30880 type: NodePort # apiserver: # Enlarge the apiserver and controller manager's resource requests and limits for the large cluster # resources: {} # controllerManager: # resources: {} redis: enabled: false enableHA: false volumeSize: 2Gi # Redis PVC size. openldap: enabled: false volumeSize: 2Gi # openldap PVC size. minio: volumeSize: 20Gi # Minio PVC size. monitoring: # type: external # Whether to specify the external prometheus stack, and need to modify the endpoint at the next line. endpoint: http://prometheus-operated.kubesphere-monitoring-system.svc:9090 # Prometheus endpoint to get metrics data. GPUMonitoring: # Enable or disable the GPU-related metrics. If you enable this switch but have no GPU resources, Kubesphere will set it to zero. enabled: false gpu: # Install GPUKinds. The default GPU kind is nvidia.com/gpu. Other GPU kinds can be added here according to your needs. kinds: - resourceName: "nvidia.com/gpu" resourceType: "GPU" default: true es: # Storage backend for logging, events and auditing. # master: # volumeSize: 4Gi # The volume size of Elasticsearch master nodes. # replicas: 1 # The total number of master nodes. Even numbers are not allowed. # resources: {} # data: # volumeSize: 20Gi # The volume size of Elasticsearch data nodes. # replicas: 1 # The total number of data nodes. # resources: {} logMaxAge: 7 # Log retention time in built-in Elasticsearch. It is 7 days by default. elkPrefix: logstash # The string making up index names. The index name will be formatted as ks-<elk_prefix>-log. basicAuth: enabled: false username: "" password: "" externalElasticsearchHost: "" externalElasticsearchPort: "" alerting: # (CPU: 0.1 Core, Memory: 100 MiB) It enables users to customize alerting policies to send messages to receivers in time with different time intervals and alerting levels to choose from. enabled: false # Enable or disable the KubeSphere Alerting System. # thanosruler: # replicas: 1 # resources: {} auditing: # Provide a security-relevant chronological set of records,recording the sequence of activities happening on the platform, initiated by different tenants. enabled: false # Enable or disable the KubeSphere Auditing Log System. # operator: # resources: {} # webhook: # resources: {} devops: # (CPU: 0.47 Core, Memory: 8.6 G) Provide an out-of-the-box CI/CD system based on Jenkins, and automated workflow tools including Source-to-Image & Binary-to-Image. enabled: false # Enable or disable the KubeSphere DevOps System. # resources: {} jenkinsMemoryLim: 2Gi # Jenkins memory limit. jenkinsMemoryReq: 1500Mi # Jenkins memory request. jenkinsVolumeSize: 8Gi # Jenkins volume size. jenkinsJavaOpts_Xms: 1200m # The following three fields are JVM parameters. jenkinsJavaOpts_Xmx: 1600m jenkinsJavaOpts_MaxRAM: 2g events: # Provide a graphical web console for Kubernetes Events exporting, filtering and alerting in multi-tenant Kubernetes clusters. enabled: false # Enable or disable the KubeSphere Events System. # operator: # resources: {} # exporter: # resources: {} # ruler: # enabled: true # replicas: 2 # resources: {} logging: # (CPU: 57 m, Memory: 2.76 G) Flexible logging functions are provided for log query, collection and management in a unified console. Additional log collectors can be added, such as Elasticsearch, Kafka and Fluentd. enabled: false # Enable or disable the KubeSphere Logging System. logsidecar: enabled: true replicas: 2 # resources: {} metrics_server: # (CPU: 56 m, Memory: 44.35 MiB) It enables HPA (Horizontal Pod Autoscaler). enabled: false # Enable or disable metrics-server. monitoring: storageClass: "" # If there is an independent StorageClass you need for Prometheus, you can specify it here. The default StorageClass is used by default. node_exporter: port: 9100 # resources: {} # kube_rbac_proxy: # resources: {} # kube_state_metrics: # resources: {} # prometheus: # replicas: 1 # Prometheus replicas are responsible for monitoring different segments of data source and providing high availability. # volumeSize: 20Gi # Prometheus PVC size. # resources: {} # operator: # resources: {} # alertmanager: # replicas: 1 # AlertManager Replicas. # resources: {} # notification_manager: # resources: {} # operator: # resources: {} # proxy: # resources: {} gpu: # GPU monitoring-related plug-in installation. nvidia_dcgm_exporter: # Ensure that gpu resources on your hosts can be used normally, otherwise this plug-in will not work properly. enabled: false # Check whether the labels on the GPU hosts contain "nvidia.com/gpu.present=true" to ensure that the DCGM pod is scheduled to these nodes. # resources: {} multicluster: clusterRole: none # host | member | none # You can install a solo cluster, or specify it as the Host or Member Cluster. network: networkpolicy: # Network policies allow network isolation within the same cluster, which means firewalls can be set up between certain instances (Pods). # Make sure that the CNI network plugin used by the cluster supports NetworkPolicy. There are a number of CNI network plugins that support NetworkPolicy, including Calico, Cilium, Kube-router, Romana and Weave Net. enabled: false # Enable or disable network policies. ippool: # Use Pod IP Pools to manage the Pod network address space. Pods to be created can be assigned IP addresses from a Pod IP Pool. type: none # Specify "calico" for this field if Calico is used as your CNI plugin. "none" means that Pod IP Pools are disabled. topology: # Use Service Topology to view Service-to-Service communication based on Weave Scope. type: none # Specify "weave-scope" for this field to enable Service Topology. "none" means that Service Topology is disabled. openpitrix: # An App Store that is accessible to all platform tenants. You can use it to manage apps across their entire lifecycle. store: enabled: false # Enable or disable the KubeSphere App Store. servicemesh: # (0.3 Core, 300 MiB) Provide fine-grained traffic management, observability and tracing, and visualized traffic topology. enabled: false # Base component (pilot). Enable or disable KubeSphere Service Mesh (Istio-based). istio: # Customizing the istio installation configuration, refer to https://istio.io/latest/docs/setup/additional-setup/customize-installation/ components: ingressGateways: - name: istio-ingressgateway enabled: false cni: enabled: false edgeruntime: # Add edge nodes to your cluster and deploy workloads on edge nodes. enabled: false kubeedge: # kubeedge configurations enabled: false cloudCore: cloudHub: advertiseAddress: # At least a public IP address or an IP address which can be accessed by edge nodes must be provided. - "" # Note that once KubeEdge is enabled, CloudCore will malfunction if the address is not provided. service: cloudhubNodePort: "30000" cloudhubQuicNodePort: "30001" cloudhubHttpsNodePort: "30002" cloudstreamNodePort: "30003" tunnelNodePort: "30004" # resources: {} # hostNetWork: false iptables-manager: enabled: true mode: "external" # resources: {} # edgeService: # resources: {} gatekeeper: # Provide admission policy and rule management, A validating (mutating TBA) webhook that enforces CRD-based policies executed by Open Policy Agent. enabled: false # Enable or disable Gatekeeper. # controller_manager: # resources: {} # audit: # resources: {} terminal: # image: 'alpine:3.15' # There must be an nsenter program in the image timeout: 600 # Container timeout, if set to 0, no timeout will be used. The unit is seconds
同样是修改这一段:
servicemesh:
enabled: true # 将“false”更改为“true”。
istio: # Customizing the istio installation configuration, refer to https://istio.io/latest/docs/setup/additional-setup/customize-installation/
components:
ingressGateways:
- name: istio-ingressgateway # 将服务暴露至服务网格之外。默认不开启。
enabled: false
cni:
enabled: false # 启用后,会在 Kubernetes pod 生命周期的网络设置阶段完成 Istio 网格的 pod 流量转发设置工作。
之后:
kubectl apply -f https://github.com/kubesphere/ks-installer/releases/download/v3.3.0/kubesphere-installer.yaml
kubectl apply -f cluster-configuration.yaml
K8S会根据描述的yaml
文件配置项自动安装KubeSphere,并安装和配置好Istio。
由于Istio组件是运行在K8S的pod中,故只要有K8S环境就可以启动pod使用Istio。安装好KubeSphere之后,不管KubeSphere是运行在Linux还是K8S的pod,也不管K8S集群是All-in-one还是多节点,在KubeSphere中配置启动Istio都很容易。
按照笔者的All-in-one环境:https://lizhiyong.blog.csdn.net/article/details/126236516
先使用管理员登录:
http://192.168.88.20:30880
admin
Aa123456
平台管理→集群管理:
在定制资源定义,搜索clusterconf
:
点进这个ClusterConfiguration
之后:
可以编辑yaml
。
该yaml目前的内容:
apiVersion: installer.kubesphere.io/v1alpha1 kind: ClusterConfiguration metadata: annotations: kubectl.kubernetes.io/last-applied-configuration: > {"apiVersion":"installer.kubesphere.io/v1alpha1","kind":"ClusterConfiguration","metadata":{"annotations":{},"labels":{"version":"v3.3.0"},"name":"ks-installer","namespace":"kubesphere-system"},"spec":{"alerting":{"enabled":false},"auditing":{"enabled":false},"authentication":{"jwtSecret":""},"common":{"core":{"console":{"enableMultiLogin":true,"port":30880,"type":"NodePort"}},"es":{"basicAuth":{"enabled":false,"password":"","username":""},"elkPrefix":"logstash","externalElasticsearchHost":"","externalElasticsearchPort":"","logMaxAge":7},"gpu":{"kinds":[{"default":true,"resourceName":"nvidia.com/gpu","resourceType":"GPU"}]},"minio":{"volumeSize":"20Gi"},"monitoring":{"GPUMonitoring":{"enabled":false},"endpoint":"http://prometheus-operated.kubesphere-monitoring-system.svc:9090"},"openldap":{"enabled":false,"volumeSize":"2Gi"},"redis":{"enabled":false,"volumeSize":"2Gi"}},"devops":{"enabled":false,"jenkinsJavaOpts_MaxRAM":"2g","jenkinsJavaOpts_Xms":"1200m","jenkinsJavaOpts_Xmx":"1600m","jenkinsMemoryLim":"2Gi","jenkinsMemoryReq":"1500Mi","jenkinsVolumeSize":"8Gi"},"edgeruntime":{"enabled":false,"kubeedge":{"cloudCore":{"cloudHub":{"advertiseAddress":[""]},"service":{"cloudhubHttpsNodePort":"30002","cloudhubNodePort":"30000","cloudhubQuicNodePort":"30001","cloudstreamNodePort":"30003","tunnelNodePort":"30004"}},"enabled":false,"iptables-manager":{"enabled":true,"mode":"external"}}},"etcd":{"endpointIps":"192.168.88.20","monitoring":false,"port":2379,"tlsEnable":true},"events":{"enabled":false},"logging":{"enabled":false,"logsidecar":{"enabled":true,"replicas":2}},"metrics_server":{"enabled":false},"monitoring":{"gpu":{"nvidia_dcgm_exporter":{"enabled":false}},"node_exporter":{"port":9100},"storageClass":""},"multicluster":{"clusterRole":"none"},"network":{"ippool":{"type":"none"},"networkpolicy":{"enabled":false},"topology":{"type":"none"}},"openpitrix":{"store":{"enabled":false}},"persistence":{"storageClass":""},"servicemesh":{"enabled":false,"istio":{"components":{"cni":{"enabled":false},"ingressGateways":[{"enabled":false,"name":"istio-ingressgateway"}]}}},"terminal":{"timeout":600},"zone":"cn"}} labels: version: v3.3.0 name: ks-installer namespace: kubesphere-system spec: alerting: enabled: false auditing: enabled: false authentication: jwtSecret: '' common: core: console: enableMultiLogin: true port: 30880 type: NodePort es: basicAuth: enabled: false password: '' username: '' elkPrefix: logstash externalElasticsearchHost: '' externalElasticsearchPort: '' logMaxAge: 7 gpu: kinds: - default: true resourceName: nvidia.com/gpu resourceType: GPU minio: volumeSize: 20Gi monitoring: GPUMonitoring: enabled: false endpoint: 'http://prometheus-operated.kubesphere-monitoring-system.svc:9090' openldap: enabled: false volumeSize: 2Gi redis: enabled: false volumeSize: 2Gi devops: enabled: false jenkinsJavaOpts_MaxRAM: 2g jenkinsJavaOpts_Xms: 1200m jenkinsJavaOpts_Xmx: 1600m jenkinsMemoryLim: 2Gi jenkinsMemoryReq: 1500Mi jenkinsVolumeSize: 8Gi edgeruntime: enabled: false kubeedge: cloudCore: cloudHub: advertiseAddress: - '' service: cloudhubHttpsNodePort: '30002' cloudhubNodePort: '30000' cloudhubQuicNodePort: '30001' cloudstreamNodePort: '30003' tunnelNodePort: '30004' enabled: false iptables-manager: enabled: true mode: external etcd: endpointIps: 192.168.88.20 monitoring: false port: 2379 tlsEnable: true events: enabled: false logging: enabled: false logsidecar: enabled: true replicas: 2 metrics_server: enabled: false monitoring: gpu: nvidia_dcgm_exporter: enabled: false node_exporter: port: 9100 storageClass: '' multicluster: clusterRole: none network: ippool: type: none networkpolicy: enabled: false topology: type: none openpitrix: store: enabled: false persistence: storageClass: '' servicemesh: enabled: false istio: components: cni: enabled: false ingressGateways: - enabled: false name: istio-ingressgateway terminal: timeout: 600 zone: cn
显然按照官网文档,应该将末尾修改为:
servicemesh:
enabled: true # 将“false”更改为“true”。
istio: # Customizing the istio installation configuration, refer to https://istio.io/latest/docs/setup/additional-setup/customize-installation/
components:
ingressGateways:
- name: istio-ingressgateway # 将服务暴露至服务网格之外。默认不开启。
enabled: false
cni:
enabled: false # 启用后,会在 Kubernetes pod 生命周期的网络设置阶段完成 Istio 网格的 pod 流量转发设置工作。
根据yaml的规范,true前的空格绝对不能少!!!
确定保存后,即可检查Istio组件的安装过程【Ubuntu20.04需要切换root
用户执行】:
kubectl logs -n kubesphere-system $(kubectl get pod -n kubesphere-system -l 'app in (ks-install, ks-installer)' -o jsonpath='{.items[0].metadata.name}') -f
然后:
喜闻乐见的1核有难,15核点赞。。。有条件还是要上高主频的U,虽然核多也很重要。
Top查看CPU占用情况,发现python3占用了99.7%的CPU。。。
等一阵子以后:
Waiting for all tasks to be completed ... task network status is successful (1/5) task openpitrix status is successful (2/5) task multicluster status is successful (3/5) task monitoring status is successful (4/5) task servicemesh status is successful (5/5) ************************************************** Collecting installation results ... ##################################################### ### Welcome to KubeSphere! ### ##################################################### Console: http://192.168.88.20:30880 Account: admin Password: P@88w0rd NOTES: 1. After you log into the console, please check the monitoring status of service components in "Cluster Management". If any service is not ready, please wait patiently until all components are up and running. 2. Please change the default password after login.
这就代表完成了Istio的安装及初始化。不需要理会这个原始密码
。重新登录还是要用已经更改的密码
。
在WebUI可以看到:
系统组件中已经出现了Istio
组件。但是点进去发现:
此时不但Istio
异常,连带之前正常的Prometheus
也一并异常了:
执行:
root@zhiyong-ksp1:/home/zhiyong# kubectl get pod -n istio-system NAME READY STATUS RESTARTS AGE istiod-1-11-2-54dd699c87-99krn 0/1 ContainerCreating 0 27m jaeger-operator-fccc48b86-vtcr8 0/1 ContainerCreating 0 7m10s kiali-operator-c459985f7-sttfs 0/1 ContainerCreating 0 7m5s root@zhiyong-ksp1:/home/zhiyong# kubectl get pod --all-namespaces NAMESPACE NAME READY STATUS RESTARTS AGE istio-system istiod-1-11-2-54dd699c87-99krn 0/1 ContainerCreating 0 30m istio-system jaeger-operator-fccc48b86-vtcr8 0/1 ContainerCreating 0 9m53s istio-system kiali-operator-c459985f7-sttfs 0/1 ContainerCreating 0 9m48s kube-system calico-kube-controllers-f9f9bbcc9-2v7lm 1/1 Running 1 (8d ago) 8d kube-system calico-node-4mgc7 1/1 Running 1 (8d ago) 8d kube-system coredns-f657fccfd-2gw7h 1/1 Running 1 (8d ago) 8d kube-system coredns-f657fccfd-pflwf 1/1 Running 1 (8d ago) 8d kube-system kube-apiserver-zhiyong-ksp1 1/1 Running 1 (8d ago) 8d kube-system kube-controller-manager-zhiyong-ksp1 1/1 Running 1 (8d ago) 8d kube-system kube-proxy-cn68l 1/1 Running 1 (8d ago) 8d kube-system kube-scheduler-zhiyong-ksp1 1/1 Running 1 (8d ago) 8d kube-system nodelocaldns-96gtw 1/1 Running 1 (8d ago) 8d kube-system openebs-localpv-provisioner-68db4d895d-p9527 1/1 Running 0 8d kube-system snapshot-controller-0 1/1 Running 1 (8d ago) 8d kubesphere-controls-system default-http-backend-587748d6b4-ccg59 1/1 Running 1 (8d ago) 8d kubesphere-controls-system kubectl-admin-5d588c455b-82cnk 1/1 Running 1 (8d ago) 8d kubesphere-logging-system elasticsearch-logging-curator-elasticsearch-curator-2767784rhhk 0/1 ContainerCreating 0 15m kubesphere-logging-system elasticsearch-logging-data-0 0/1 Pending 0 32m kubesphere-logging-system elasticsearch-logging-discovery-0 0/1 Pending 0 32m kubesphere-monitoring-system alertmanager-main-0 2/2 Running 2 (8d ago) 8d kubesphere-monitoring-system kube-state-metrics-6d6786b44-bbb4f 3/3 Running 3 (8d ago) 8d kubesphere-monitoring-system node-exporter-8sz74 2/2 Running 2 (8d ago) 8d kubesphere-monitoring-system notification-manager-deployment-6f8c66ff88-pt4l8 2/2 Running 2 (8d ago) 8d kubesphere-monitoring-system notification-manager-operator-6455b45546-nkmx8 2/2 Running 2 (8d ago) 8d kubesphere-monitoring-system prometheus-k8s-0 0/2 Terminating 0 8d kubesphere-monitoring-system prometheus-operator-66d997dccf-c968c 2/2 Running 2 (8d ago) 8d kubesphere-system ks-apiserver-6b9bcb86f4-hsdzs 1/1 Running 1 (8d ago) 8d kubesphere-system ks-console-599c49d8f6-ngb6b 1/1 Running 1 (8d ago) 8d kubesphere-system ks-controller-manager-66747fcddc-r7cpt 1/1 Running 1 (8d ago) 8d kubesphere-system ks-installer-5fd8bd46b8-dzhbb 1/1 Running 1 (8d ago) 8d
耐心等一会儿。。。
从KubeSphere的web UI
监控可以看出目前状态还是容器创建中。但是一直这样也不合适。。。
root@zhiyong-ksp1:/home/zhiyong# kubectl describe pod istiod-1-11-2-54dd699c87-99krn -n istio-system Name: istiod-1-11-2-54dd699c87-99krn Namespace: istio-system Priority: 0 Node: zhiyong-ksp1/192.168.88.20 Start Time: Wed, 17 Aug 2022 00:44:55 +0800 Labels: app=istiod install.operator.istio.io/owning-resource=unknown istio=istiod istio.io/rev=1-11-2 operator.istio.io/component=Pilot pod-template-hash=54dd699c87 sidecar.istio.io/inject=false Annotations: prometheus.io/port: 15014 prometheus.io/scrape: true sidecar.istio.io/inject: false Status: Pending IP: IPs: <none> Controlled By: ReplicaSet/istiod-1-11-2-54dd699c87 Containers: discovery: Container ID: Image: registry.cn-beijing.aliyuncs.com/kubesphereio/pilot:1.11.1 Image ID: Ports: 8080/TCP, 15010/TCP, 15017/TCP Host Ports: 0/TCP, 0/TCP, 0/TCP Args: discovery --monitoringAddr=:15014 --log_output_level=default:info --domain cluster.local --keepaliveMaxServerConnectionAge 30m State: Waiting Reason: ContainerCreating Ready: False Restart Count: 0 Requests: cpu: 500m memory: 2Gi Readiness: http-get http://:8080/ready delay=1s timeout=5s period=3s #success=1 #failure=3 Environment: REVISION: 1-11-2 JWT_POLICY: first-party-jwt PILOT_CERT_PROVIDER: istiod POD_NAME: istiod-1-11-2-54dd699c87-99krn (v1:metadata.name) POD_NAMESPACE: istio-system (v1:metadata.namespace) SERVICE_ACCOUNT: (v1:spec.serviceAccountName) KUBECONFIG: /var/run/secrets/remote/config ENABLE_LEGACY_FSGROUP_INJECTION: false PILOT_TRACE_SAMPLING: 1 PILOT_ENABLE_PROTOCOL_SNIFFING_FOR_OUTBOUND: true PILOT_ENABLE_PROTOCOL_SNIFFING_FOR_INBOUND: true ISTIOD_ADDR: istiod-1-11-2.istio-system.svc:15012 PILOT_ENABLE_ANALYSIS: false CLUSTER_ID: Kubernetes Mounts: /etc/cacerts from cacerts (ro) /var/run/secrets/istio-dns from local-certs (rw) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-l54jm (ro) /var/run/secrets/remote from istio-kubeconfig (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: local-certs: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium: Memory SizeLimit: <unset> cacerts: Type: Secret (a volume populated by a Secret) SecretName: cacerts Optional: true istio-kubeconfig: Type: Secret (a volume populated by a Secret) SecretName: istio-kubeconfig Optional: true kube-api-access-l54jm: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3607 ConfigMapName: kube-root-ca.crt ConfigMapOptional: <nil> DownwardAPI: true QoS Class: Burstable Node-Selectors: <none> Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 43m default-scheduler Successfully assigned istio-system/istiod-1-11-2-54dd699c87-99krn to zhiyong-ksp1 Warning FailedCreatePodSandBox 43m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "5d0a3bdb6dea937aa5b118bbd00305a1542111c97af84a3cbdd8f188b1681687": plugin type="calico" failed (add): error getting ClusterInformation: connection is unauthorized: Unauthorized Warning FailedCreatePodSandBox 43m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "ff84de82acfd944be7f3804c96f39ab976ae4d6810b7e0364c90560a4b4070e7": plugin type="calico" failed (add): error getting ClusterInformation: connection is unauthorized: Unauthorized Warning FailedCreatePodSandBox 42m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "6337bea6f7c16cd9adcff0d2b75238beb4365dc4b880d4c8e4f4535885d59d30": plugin type="calico" failed (add): error getting ClusterInformation: connection is unauthorized: Unauthorized Warning FailedCreatePodSandBox 42m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "42e08603d4d7e7d1713eecbb21af258022e3fb50c6f5611808b3e2755d50d980": plugin type="calico" failed (add): error getting ClusterInformation: connection is unauthorized: Unauthorized Warning FailedCreatePodSandBox 42m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "51a6b5b8ea5a63f4be828a0c855802e42640324c440fcc3487c535123d7b3372": plugin type="calico" failed (add): error getting ClusterInformation: connection is unauthorized: Unauthorized Warning FailedCreatePodSandBox 42m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "dada948b2a416a0ec925b7f67a101b8fd48fdad9fb20d6c41eaf1bbad0a18e57": plugin type="calico" failed (add): error getting ClusterInformation: connection is unauthorized: Unauthorized Warning FailedCreatePodSandBox 41m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "df3487e020c1e7eb527cc0fce1fe990873bd20f46cbf04de99005e0da5896abe": plugin type="calico" failed (add): error getting ClusterInformation: connection is unauthorized: Unauthorized Warning FailedCreatePodSandBox 41m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "92e739549a96aa03ea864188abc1b91c9a45394dae28ad97234fa1caf4d52240": plugin type="calico" failed (add): error getting ClusterInformation: connection is unauthorized: Unauthorized Warning FailedCreatePodSandBox 41m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "bc5d1999a2d5ad4d7cf5c1e1c3c7c1a80dee02b806d0be2e15c326e2d82f4af5": plugin type="calico" failed (add): error getting ClusterInformation: connection is unauthorized: Unauthorized Warning FailedCreatePodSandBox 3m2s (x176 over 41m) kubelet (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "be41a317c2e14b4096f2f8f0d4bfaa8a80572f7365ab3d92c20be75fe97304f4": plugin type="calico" failed (add): error getting ClusterInformation: connection is unauthorized: Unauthorized
显然出现了网络没有认证通过
的问题。
再次查看Prometheus
的日志:
root@zhiyong-ksp1:/home/zhiyong# kubectl describe pod prometheus-k8s-0 -n kubesphere-monitoring-system Name: prometheus-k8s-0 Namespace: kubesphere-monitoring-system Priority: 0 Node: zhiyong-ksp1/192.168.88.20 Start Time: Mon, 08 Aug 2022 20:42:21 +0800 Labels: app.kubernetes.io/component=prometheus app.kubernetes.io/instance=k8s app.kubernetes.io/managed-by=prometheus-operator app.kubernetes.io/name=prometheus app.kubernetes.io/part-of=kube-prometheus app.kubernetes.io/version=2.34.0 controller-revision-hash=prometheus-k8s-557cc865c4 operator.prometheus.io/name=k8s operator.prometheus.io/shard=0 prometheus=k8s statefulset.kubernetes.io/pod-name=prometheus-k8s-0 Annotations: cni.projectcalico.org/containerID: 1d4064f425cad8043d3b38e60155e778e9a1390bc2486b76ac29ad14fb589b40 cni.projectcalico.org/podIP: 10.233.107.36/32 cni.projectcalico.org/podIPs: 10.233.107.36/32 kubectl.kubernetes.io/default-container: prometheus Status: Terminating (lasts 41m) Termination Grace Period: 600s IP: 10.233.107.36 IPs: IP: 10.233.107.36 Controlled By: StatefulSet/prometheus-k8s Init Containers: init-config-reloader: Container ID: containerd://f29630d87dccf60dc8bd065f53ad5187d2f7600a35500a4fa4bfd71a2118daa6 Image: registry.cn-beijing.aliyuncs.com/kubesphereio/prometheus-config-reloader:v0.55.1 Image ID: registry.cn-beijing.aliyuncs.com/kubesphereio/prometheus-config-reloader@sha256:7743c7ef48f9c0ae6f5c0de4b26e7ff6ae9ece4917a4e139acb21a0d8e77aa3c Port: 8080/TCP Host Port: 0/TCP Command: /bin/prometheus-config-reloader Args: --watch-interval=0 --listen-address=:8080 --config-file=/etc/prometheus/config/prometheus.yaml.gz --config-envsubst-file=/etc/prometheus/config_out/prometheus.env.yaml --watched-dir=/etc/prometheus/rules/prometheus-k8s-rulefiles-0 State: Terminated Reason: Completed Exit Code: 0 Started: Mon, 08 Aug 2022 20:42:21 +0800 Finished: Mon, 08 Aug 2022 20:42:22 +0800 Ready: True Restart Count: 0 Limits: cpu: 100m memory: 50Mi Requests: cpu: 100m memory: 50Mi Environment: POD_NAME: prometheus-k8s-0 (v1:metadata.name) SHARD: 0 Mounts: /etc/prometheus/config from config (rw) /etc/prometheus/config_out from config-out (rw) /etc/prometheus/rules/prometheus-k8s-rulefiles-0 from prometheus-k8s-rulefiles-0 (rw) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-vcb4c (ro) Containers: prometheus: Container ID: containerd://2b913fb7dadcc7342759437d2068d0a9cbdcd96fadbb567c0ca5212ca72fb372 Image: registry.cn-beijing.aliyuncs.com/kubesphereio/prometheus:v2.34.0 Image ID: registry.cn-beijing.aliyuncs.com/kubesphereio/prometheus@sha256:b37103e03399e90c9b7b1b2940894d3634915cf9df4aa2e5402bd85b4377808c Port: 9090/TCP Host Port: 0/TCP Args: --web.console.templates=/etc/prometheus/consoles --web.console.libraries=/etc/prometheus/console_libraries --storage.tsdb.retention.time=7d --config.file=/etc/prometheus/config_out/prometheus.env.yaml --storage.tsdb.path=/prometheus --web.enable-lifecycle --query.max-concurrency=1000 --web.route-prefix=/ --web.config.file=/etc/prometheus/web_config/web-config.yaml State: Terminated Reason: Completed Exit Code: 0 Started: Mon, 08 Aug 2022 20:42:51 +0800 Finished: Wed, 17 Aug 2022 00:45:37 +0800 Ready: False Restart Count: 0 Limits: cpu: 4 memory: 16Gi Requests: cpu: 200m memory: 400Mi Liveness: http-get http://:web/-/healthy delay=0s timeout=3s period=5s #success=1 #failure=6 Readiness: http-get http://:web/-/ready delay=0s timeout=3s period=5s #success=1 #failure=3 Startup: http-get http://:web/-/ready delay=0s timeout=3s period=15s #success=1 #failure=60 Environment: <none> Mounts: /etc/prometheus/certs from tls-assets (ro) /etc/prometheus/config_out from config-out (ro) /etc/prometheus/rules/prometheus-k8s-rulefiles-0 from prometheus-k8s-rulefiles-0 (rw) /etc/prometheus/web_config/web-config.yaml from web-config (ro,path="web-config.yaml") /prometheus from prometheus-k8s-db (rw,path="prometheus-db") /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-vcb4c (ro) config-reloader: Container ID: containerd://215303f25ece01ad28e56a8d94c19b00cbd9429d10cddc1b1db9981802e74011 Image: registry.cn-beijing.aliyuncs.com/kubesphereio/prometheus-config-reloader:v0.55.1 Image ID: registry.cn-beijing.aliyuncs.com/kubesphereio/prometheus-config-reloader@sha256:7743c7ef48f9c0ae6f5c0de4b26e7ff6ae9ece4917a4e139acb21a0d8e77aa3c Port: 8080/TCP Host Port: 0/TCP Command: /bin/prometheus-config-reloader Args: --listen-address=:8080 --reload-url=http://localhost:9090/-/reload --config-file=/etc/prometheus/config/prometheus.yaml.gz --config-envsubst-file=/etc/prometheus/config_out/prometheus.env.yaml --watched-dir=/etc/prometheus/rules/prometheus-k8s-rulefiles-0 State: Terminated Reason: Error Message: level=info ts=2022-08-08T12:42:51.99954274Z caller=main.go:111 msg="Starting prometheus-config-reloader" version="(version=0.55.1, branch=refs/tags/v0.55.1, revision=08c846115c67195bc821018168040db6f3e236e3)" level=info ts=2022-08-08T12:42:51.999646088Z caller=main.go:112 build_context="(go=go1.17.7, user=Action-Run-ID-2045821452, date=20220326-21:47:32)" level=info ts=2022-08-08T12:42:52.093230589Z caller=main.go:149 msg="Starting web server for metrics" listen=:8080 level=info ts=2022-08-08T12:42:52.195172719Z caller=reloader.go:373 msg="Reload triggered" cfg_in=/etc/prometheus/config/prometheus.yaml.gz cfg_out=/etc/prometheus/config_out/prometheus.env.yaml watched_dirs=/etc/prometheus/rules/prometheus-k8s-rulefiles-0 level=info ts=2022-08-08T12:42:52.195306486Z caller=reloader.go:235 msg="started watching config file and directories for changes" cfg=/etc/prometheus/config/prometheus.yaml.gz out=/etc/prometheus/config_out/prometheus.env.yaml dirs=/etc/prometheus/rules/prometheus-k8s-rulefiles-0 Exit Code: 2 Started: Mon, 08 Aug 2022 20:42:51 +0800 Finished: Wed, 17 Aug 2022 00:45:36 +0800 Ready: False Restart Count: 0 Limits: cpu: 100m memory: 50Mi Requests: cpu: 100m memory: 50Mi Environment: POD_NAME: prometheus-k8s-0 (v1:metadata.name) SHARD: 0 Mounts: /etc/prometheus/config from config (rw) /etc/prometheus/config_out from config-out (rw) /etc/prometheus/rules/prometheus-k8s-rulefiles-0 from prometheus-k8s-rulefiles-0 (rw) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-vcb4c (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: prometheus-k8s-db: Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace) ClaimName: prometheus-k8s-db-prometheus-k8s-0 ReadOnly: false config: Type: Secret (a volume populated by a Secret) SecretName: prometheus-k8s Optional: false tls-assets: Type: Projected (a volume that contains injected data from multiple sources) SecretName: prometheus-k8s-tls-assets-0 SecretOptionalName: <nil> config-out: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium: SizeLimit: <unset> prometheus-k8s-rulefiles-0: Type: ConfigMap (a volume populated by a ConfigMap) Name: prometheus-k8s-rulefiles-0 Optional: false web-config: Type: Secret (a volume populated by a Secret) SecretName: prometheus-k8s-web-config Optional: false kube-api-access-vcb4c: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3607 ConfigMapName: kube-root-ca.crt ConfigMapOptional: <nil> DownwardAPI: true QoS Class: Burstable Node-Selectors: kubernetes.io/os=linux Tolerations: dedicated=monitoring:NoSchedule node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Killing 51m kubelet Stopping container prometheus Normal Killing 51m kubelet Stopping container config-reloader Warning FailedKillPod 63s (x231 over 51m) kubelet error killing pod: failed to "KillPodSandbox" for "35e28d63-59c1-4860-a9bc-924123478928" with KillPodSandboxError: "rpc error: code = Unknown desc = failed to destroy network for sandbox \"1d4064f425cad8043d3b38e60155e778e9a1390bc2486b76ac29ad14fb589b40\": plugin type=\"calico\" failed (delete): error getting ClusterInformation: connection is unauthorized: Unauthorized"
根据报错日志,基本上确定是Calico
的问题。
root@zhiyong-ksp1:/etc/cni/net.d# pwd /etc/cni/net.d root@zhiyong-ksp1:/etc/cni/net.d# ll 总用量 16 drwxr-xr-x 2 kube root 4096 8月 8 10:05 ./ drwxr-xr-x 3 kube root 4096 8月 8 10:02 ../ -rw-r--r-- 1 root root 663 8月 8 19:23 10-calico.conflist -rw------- 1 root root 2713 8月 8 20:34 calico-kubeconfig root@zhiyong-ksp1:/etc/cni/net.d# cat 10-calico.conflist { "name": "k8s-pod-network", "cniVersion": "0.3.1", "plugins": [ { "type": "calico", "log_level": "info", "log_file_path": "/var/log/calico/cni/cni.log", "datastore_type": "kubernetes", "nodename": "zhiyong-ksp1", "mtu": 0, "ipam": { "type": "calico-ipam" }, "policy": { "type": "k8s" }, "kubernetes": { "kubeconfig": "/etc/cni/net.d/calico-kubeconfig" } }, { "type": "portmap", "snat": true, "capabilities": {"portMappings": true} }, { "type": "bandwidth", "capabilities": {"bandwidth": true} } ] root@zhiyong-ksp1:/etc/cni/net.d# cat 10-calico.conflist { "name": "k8s-pod-network", "cniVersion": "0.3.1", "plugins": [ { "type": "calico", "log_level": "info", "log_file_path": "/var/log/calico/cni/cni.log", "datastore_type": "kubernetes", "nodename": "zhiyong-ksp1", "mtu": 0, "ipam": { "type": "calico-ipam" }, "policy": { "type": "k8s" }, "kubernetes": { "kubeconfig": "/etc/cni/net.d/calico-kubeconfig" } }, { "type": "portmap", "snat": true, "capabilities": {"portMappings": true} }, { "type": "bandwidth", "capabilities": {"bandwidth": true} } ] }root@zhiyong-ksp1:/etc/cni/net.d# cat calico-kubeconfig # Kubeconfig file for Calico CNI plugin. Installed by calico/node. apiVersion: v1 kind: Config clusters: - name: local cluster: server: https://10.233.0.1:443 certificate-authority-data: "LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUMvakNDQWVhZ0F3SUJBZ0lCQURBTkJna3Foa2lHOXcwQkFRc0ZBREFWTVJNd0VRWURWUVFERXdwcmRXSmwKY201bGRHVnpNQjRYRFRJeU1EZ3dPREF5TURRek5Wb1hEVE15TURnd05UQXlNRFF6TlZvd0ZURVRNQkVHQTFVRQpBeE1LYTNWaVpYSnVaWFJsY3pDQ0FTSXdEUVlKS29aSWh2Y05BUUVCQlFBRGdnRVBBRENDQVFvQ2dnRUJBTHM5ClcxTkxMWGNHNlhIdzZ0VEVyV1pWTXdlUUdTV2IzU3UrMTN0V2REcUlhcm16YW1BWGNNbnlValRoNWhQdFZVVjcKNVdjYldXcFh3VTNOaWhpSXRmOXhoZ2tsMy9KVElycFBSdlRBc3VUVUo1RW9yb3BNLzNpRWpBZUc0d0RNQURtYwpKNHArSjlJSzZWekV4UUI3VTA2L1F6eWhRT3RQQS83dFlhbjM2dFE3eFRJYmJvQ3AvQXRSNHdqOXBBRHVSV1M2CnQ0ZlFZMUh4NHpaS1pmeEpBaXF5MXl5Ylg0ckxSektYMzJ0MXlsYk9ET21kWjZXVjJLZEgzYjV2V3ZrZThzQy8KcHhMT0JvRmRVdU0ra3hkUHgxMitHaVVtbUM0NDFEdU02MVZiQ0o0NlJ4QVlDenY4bmxoQUhrTDMrL3JQZ0U1dgpaYTZuSVoxdWVabFBRRXRqL3FFQ0F3RUFBYU5aTUZjd0RnWURWUjBQQVFIL0JBUURBZ0trTUE4R0ExVWRFd0VCCi93UUZNQU1CQWY4d0hRWURWUjBPQkJZRUZDMkk4MldLNEJjSWpieEQvVjl4U0VnblNhc1pNQlVHQTFVZEVRUU8KTUF5Q0NtdDFZbVZ5Ym1WMFpYTXdEUVlKS29aSWh2Y05BUUVMQlFBRGdnRUJBRWlLbklrendTaXpKL0ZhRmd4SQpPRlNoaTNTQ0NaNHNLVXliZVhkZkIwV3FLRHpialBteEZ3LzQ0SFMwUUhaNU5TVGp6WGtHQ1kyTlpDRTE3dldWCmtDYjFVM1czQmdaM05CSmZtV29sTEJQTCtnSkovYlRuRVJUTVY4MDYrTWN6d1RBeEhWcllXcU5BT2o5R3pEdFMKc3FwVWxQZDc1MDdhZmluRmZMVFpORnF4SDV4Y0VTUDNETVF1L21GUXNxMnYyeW9XTXY4dHluVGs2V3VSa0xVQgoxd1JXdUNSeXF1OCs3dEVzMHlCNklTODF0cDBGMHZPekpoakw4bTBxQWhLbUNKUFlGTUFZRFMvNXJuZDBCb3NLClhabHlyUUxtV0ZLRDRWL2Z3T0Vua1hMS3R3VnkrdlFJYXVEWjZTaVM1ODcxMURmdlhTTWFCU1lkL0hwZW1OYmQKVFBFPQotLS0tLUVORCBDRVJUSUZJQ0FURS0tLS0tCg==" users: - name: calico user: token: eyJhbGciOiJSUzI1NiIsImtpZCI6IkNRb0VCZDRGY21PQjBSYktnYzVuSkV6UVVVY0VvOE1Jd0NCOFRYbEQ5XzQifQ.eyJhdWQiOlsiaHR0cHM6Ly9rdWJlcm5ldGVzLmRlZmF1bHQuc3ZjLmNsdXN0ZXIubG9jYWwiXSwiZXhwIjoxNjYwMDQ4NDYxLCJpYXQiOjE2NTk5NjIwNjEsImlzcyI6Imh0dHBzOi8va3ViZXJuZXRlcy5kZWZhdWx0LnN2Yy5jbHVzdGVyLmxvY2FsIiwia3ViZXJuZXRlcy5pbyI6eyJuYW1lc3BhY2UiOiJrdWJlLXN5c3RlbSIsInNlcnZpY2VhY2NvdW50Ijp7Im5hbWUiOiJjYWxpY28tbm9kZSIsInVpZCI6IjFhNDk4MWY1LWVmMWQtNDk5OC05YTA1LTk4OGU0MmMyN2Q4OCJ9fSwibmJmIjoxNjU5OTYyMDYxLCJzdWIiOiJzeXN0ZW06c2VydmljZWFjY291bnQ6a3ViZS1zeXN0ZW06Y2FsaWNvLW5vZGUifQ.Qa0KSAJGgNSA9lvND2Ivf9qxZsieI2r1FwCGvwzvXw_d4Nrw5WSygK-9t6tJKCnsXgCQSXijRBFPqiamJYZUx1dhgbPQp8KZF1seqtafCLRNnPS1TUrYJO_SRrp37UizmQzdOQOh7m_SGktcqdViZAyIGapjeMc7P8gU3v1HA93SflnR1keUo5rbXJjpaj2b6F0SBUCVyQnuORopD9cdCH-jIunyp4y_GhOtutV71ZmxcZeCdDqaBAE5OTnIwGYwz5yZqCOJZGqRxI74EX1B06iFgOQs8yksFiEpp5JdFUaCWNnxAeYo5cpH72l2XzF7rb7A2Ob0Rk96wJSSEMJq8g contexts: - name: calico-context context: cluster: local user: calico
参照StackOverflow的这篇:https://stackoverflow.com/questions/61672804/after-uninstalling-calico-new-pods-are-stuck-in-container-creating-state
以及K8S官网的这篇:https://kubernetes.io/docs/concepts/extend-kubernetes/compute-storage-net/network-plugins/
需要删除这2个配置文件。笔者直接mv
移走备份。
root@zhiyong-ksp1:/home/zhiyong# mkdir -p /fileback/20220817 current-context: calico-contextroot@zhiyong-ksp1:/etc/cni/net.d# ll 总用量 16 drwxr-xr-x 2 kube root 4096 8月 8 10:05 ./ drwxr-xr-x 3 kube root 4096 8月 8 10:02 ../ -rw-r--r-- 1 root root 663 8月 8 19:23 10-calico.conflist -rw------- 1 root root 2713 8月 8 20:34 calico-kubeconfig root@zhiyong-ksp1:/etc/cni/net.d# mv ./10-calico.conflist /fileback/20220817 root@zhiyong-ksp1:/etc/cni/net.d# ll 总用量 12 drwxr-xr-x 2 kube root 4096 8月 17 01:43 ./ drwxr-xr-x 3 kube root 4096 8月 8 10:02 ../ -rw------- 1 root root 2713 8月 8 20:34 calico-kubeconfig root@zhiyong-ksp1:/etc/cni/net.d# mv ./calico-kubeconfig /fileback/20220817 root@zhiyong-ksp1:/etc/cni/net.d# ll 总用量 8 drwxr-xr-x 2 kube root 4096 8月 17 01:43 ./ drwxr-xr-x 3 kube root 4096 8月 8 10:02 ../
重启的目的是刷新Calico的配置
。
root@zhiyong-ksp1:/home/zhiyong# kubectl get pod --all-namespaces NAMESPACE NAME READY STATUS RESTARTS AGE istio-system istiod-1-11-2-54dd699c87-99krn 1/1 Running 0 65m istio-system jaeger-operator-fccc48b86-vtcr8 0/1 ContainerCreating 0 44m istio-system kiali-75c777bdf6-xhbq7 0/1 ContainerCreating 0 12s istio-system kiali-operator-c459985f7-sttfs 1/1 Running 0 44m kube-system calico-kube-controllers-f9f9bbcc9-2v7lm 1/1 Running 2 (2m54s ago) 8d kube-system calico-node-4mgc7 1/1 Running 2 (2m54s ago) 8d kube-system coredns-f657fccfd-2gw7h 1/1 Running 2 (2m54s ago) 8d kube-system coredns-f657fccfd-pflwf 1/1 Running 2 (2m54s ago) 8d kube-system kube-apiserver-zhiyong-ksp1 1/1 Running 2 (2m54s ago) 8d kube-system kube-controller-manager-zhiyong-ksp1 1/1 Running 2 (2m54s ago) 8d kube-system kube-proxy-cn68l 1/1 Running 2 (2m54s ago) 8d kube-system kube-scheduler-zhiyong-ksp1 1/1 Running 2 (2m54s ago) 8d kube-system nodelocaldns-96gtw 1/1 Running 2 (2m54s ago) 8d kube-system openebs-localpv-provisioner-68db4d895d-p9527 1/1 Running 1 (2m54s ago) 8d kube-system snapshot-controller-0 1/1 Running 2 (2m54s ago) 8d kubesphere-controls-system default-http-backend-587748d6b4-ccg59 1/1 Running 2 (2m54s ago) 8d kubesphere-controls-system kubectl-admin-5d588c455b-82cnk 1/1 Running 2 (2m54s ago) 8d kubesphere-logging-system elasticsearch-logging-curator-elasticsearch-curator-2767784rhhk 0/1 ContainerCreating 0 50m kubesphere-logging-system elasticsearch-logging-data-0 0/1 Pending 0 67m kubesphere-logging-system elasticsearch-logging-discovery-0 0/1 Pending 0 67m kubesphere-monitoring-system alertmanager-main-0 2/2 Running 4 (2m54s ago) 8d kubesphere-monitoring-system kube-state-metrics-6d6786b44-bbb4f 3/3 Running 6 (2m54s ago) 8d kubesphere-monitoring-system node-exporter-8sz74 2/2 Running 4 (2m54s ago) 8d kubesphere-monitoring-system notification-manager-deployment-6f8c66ff88-pt4l8 2/2 Running 4 (2m54s ago) 8d kubesphere-monitoring-system notification-manager-operator-6455b45546-nkmx8 2/2 Running 4 (2m54s ago) 8d kubesphere-monitoring-system prometheus-k8s-0 2/2 Running 0 2m5s kubesphere-monitoring-system prometheus-operator-66d997dccf-c968c 2/2 Running 4 (2m54s ago) 8d kubesphere-system ks-apiserver-6b9bcb86f4-hsdzs 0/1 Unknown 1 8d kubesphere-system ks-console-599c49d8f6-ngb6b 1/1 Running 2 (2m54s ago) 8d kubesphere-system ks-controller-manager-66747fcddc-r7cpt 0/1 Unknown 1 8d kubesphere-system ks-installer-5fd8bd46b8-dzhbb 1/1 Running 2 (2m54s ago) 8d
可以看到reboot
后,由于刷新了Calico的网络配置
,之前失败的Pod现在状态看起来比较正常。
kubesphere-logging-system elasticsearch-logging-data-0 0/1 Init:1/2 0 69m
kubesphere-logging-system elasticsearch-logging-discovery-0 0/1 Init:1/2 0 69m
并且这2个pod还在初始化。
此时还有一些Java进程在占用CPU:
多等一会儿:
root@zhiyong-ksp1:/home/zhiyong# kubectl get pod --all-namespaces NAMESPACE NAME READY STATUS RESTARTS AGE istio-system istiod-1-11-2-54dd699c87-99krn 1/1 Running 0 72m istio-system jaeger-collector-67cfc55477-7757f 1/1 Running 5 (3m41s ago) 6m58s istio-system jaeger-operator-fccc48b86-vtcr8 1/1 Running 0 52m istio-system jaeger-query-8497bdbfd7-csbts 2/2 Running 0 102s istio-system kiali-75c777bdf6-xhbq7 1/1 Running 0 7m37s istio-system kiali-operator-c459985f7-sttfs 1/1 Running 0 52m kube-system calico-kube-controllers-f9f9bbcc9-2v7lm 1/1 Running 2 (10m ago) 8d kube-system calico-node-4mgc7 1/1 Running 2 (10m ago) 8d kube-system coredns-f657fccfd-2gw7h 1/1 Running 2 (10m ago) 8d kube-system coredns-f657fccfd-pflwf 1/1 Running 2 (10m ago) 8d kube-system kube-apiserver-zhiyong-ksp1 1/1 Running 2 (10m ago) 8d kube-system kube-controller-manager-zhiyong-ksp1 1/1 Running 2 (10m ago) 8d kube-system kube-proxy-cn68l 1/1 Running 2 (10m ago) 8d kube-system kube-scheduler-zhiyong-ksp1 1/1 Running 2 (10m ago) 8d kube-system nodelocaldns-96gtw 1/1 Running 2 (10m ago) 8d kube-system openebs-localpv-provisioner-68db4d895d-p9527 1/1 Running 1 (10m ago) 8d kube-system snapshot-controller-0 1/1 Running 2 (10m ago) 8d kubesphere-controls-system default-http-backend-587748d6b4-ccg59 1/1 Running 2 (10m ago) 8d kubesphere-controls-system kubectl-admin-5d588c455b-82cnk 1/1 Running 2 (10m ago) 8d kubesphere-logging-system elasticsearch-logging-curator-elasticsearch-curator-2767784rhhk 0/1 Completed 0 57m kubesphere-logging-system elasticsearch-logging-data-0 1/1 Running 0 74m kubesphere-logging-system elasticsearch-logging-discovery-0 1/1 Running 0 74m kubesphere-monitoring-system alertmanager-main-0 2/2 Running 4 (10m ago) 8d kubesphere-monitoring-system kube-state-metrics-6d6786b44-bbb4f 3/3 Running 6 (10m ago) 8d kubesphere-monitoring-system node-exporter-8sz74 2/2 Running 4 (10m ago) 8d kubesphere-monitoring-system notification-manager-deployment-6f8c66ff88-pt4l8 2/2 Running 4 (10m ago) 8d kubesphere-monitoring-system notification-manager-operator-6455b45546-nkmx8 2/2 Running 4 (10m ago) 8d kubesphere-monitoring-system prometheus-k8s-0 2/2 Running 0 9m30s kubesphere-monitoring-system prometheus-operator-66d997dccf-c968c 2/2 Running 4 (10m ago) 8d kubesphere-system ks-apiserver-6b9bcb86f4-hsdzs 1/1 Running 2 (10m ago) 8d kubesphere-system ks-console-599c49d8f6-ngb6b 1/1 Running 2 (10m ago) 8d kubesphere-system ks-controller-manager-66747fcddc-r7cpt 1/1 Running 2 (10m ago) 8d kubesphere-system ks-installer-5fd8bd46b8-dzhbb 1/1 Running 2 (10m ago) 8d
发现除了elasticsearch-logging-curator-elasticsearch-curator-2767784rhhk
这个pod是completed
,其余都Running
。
从web UI
中也可以看到已经全绿,没有报错。显然Calico
及Istio
、Prometheus
的pod已经全部修复完毕。
root@zhiyong-ksp1:/home/zhiyong# kubectl describe pod elasticsearch-logging-curator-elasticsearch-curator-2767784rhhk -n kubesphere-logging-system Name: elasticsearch-logging-curator-elasticsearch-curator-2767784rhhk Namespace: kubesphere-logging-system Priority: 0 Node: zhiyong-ksp1/192.168.88.20 Start Time: Wed, 17 Aug 2022 01:00:00 +0800 Labels: app=elasticsearch-curator controller-uid=d95b480d-abb9-42ed-9c1e-873127f96dc1 job-name=elasticsearch-logging-curator-elasticsearch-curator-27677820 release=elasticsearch-logging-curator Annotations: cni.projectcalico.org/containerID: 584387ef1390db6f2d17ee0e2bc92951178cdb373c34544ecf150151253f4766 cni.projectcalico.org/podIP: cni.projectcalico.org/podIPs: Status: Succeeded IP: 10.233.107.51 IPs: IP: 10.233.107.51 Controlled By: Job/elasticsearch-logging-curator-elasticsearch-curator-27677820 Containers: elasticsearch-curator: Container ID: containerd://a2b7da0a34df9601acc062b10691dbbfad5bc22a838e18d9b95f3bd57633479e Image: registry.cn-beijing.aliyuncs.com/kubesphereio/elasticsearch-curator:v5.7.6 Image ID: registry.cn-beijing.aliyuncs.com/kubesphereio/elasticsearch-curator@sha256:0fdc68b2a211f753238f9d54734b331141a9ade5bf31eef801ea0d056c9ab1c1 Port: <none> Host Port: <none> Command: curator/curator Args: --config /etc/es-curator/config.yml /etc/es-curator/action_file.yml State: Terminated Reason: Completed Exit Code: 0 Started: Wed, 17 Aug 2022 01:51:12 +0800 Finished: Wed, 17 Aug 2022 01:51:12 +0800 Ready: False Restart Count: 0 Environment: <none> Mounts: /etc/es-curator from config-volume (rw) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-kvk6g (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: config-volume: Type: ConfigMap (a volume populated by a ConfigMap) Name: elasticsearch-logging-curator-elasticsearch-curator-config Optional: false kube-api-access-kvk6g: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3607 ConfigMapName: kube-root-ca.crt ConfigMapOptional: <nil> DownwardAPI: true QoS Class: BestEffort Node-Selectors: <none> Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 64m default-scheduler Successfully assigned kubesphere-logging-system/elasticsearch-logging-curator-elasticsearch-curator-2767784rhhk to zhiyong-ksp1 Warning FailedCreatePodSandBox 64m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "01c36acd52449dcec6b1bcac2a1f3c57577195fd915aef6ca8d1ff53ed9b5a35": plugin type="calico" failed (add): error getting ClusterInformation: connection is unauthorized: Unauthorized Warning FailedCreatePodSandBox 64m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "c0754d78516e0b4a99993dd31a5608da1b424e558560ea2c66f98856928604a9": plugin type="calico" failed (add): error getting ClusterInformation: connection is unauthorized: Unauthorized Warning FailedCreatePodSandBox 64m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "0dc2bab36922b4a73c35f3b35ffd4ef46f825fd5b053454c47665d028cd89d61": plugin type="calico" failed (add): error getting ClusterInformation: connection is unauthorized: Unauthorized Warning FailedCreatePodSandBox 63m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "cc132b632133dbc2ef32eed74bbfb9e64923530467ccd085d67907542a4cfea8": plugin type="calico" failed (add): error getting ClusterInformation: connection is unauthorized: Unauthorized Warning FailedCreatePodSandBox 63m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "8b2f3f1f0d0ebac8a0b43025d22de1c0e1b55edbc72fec6930477061f0b46bbd": plugin type="calico" failed (add): error getting ClusterInformation: connection is unauthorized: Unauthorized Warning FailedCreatePodSandBox 63m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "d6ea17333ad9c2d549f439a25b83fdb8b7338f8e4a00e5fd7adbbab1bc7c78e2": plugin type="calico" failed (add): error getting ClusterInformation: connection is unauthorized: Unauthorized Warning FailedCreatePodSandBox 63m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "d6cafd828a3fa61977ca2423bf953b7aab8f114af042fb272e7172d7f55078a6": plugin type="calico" failed (add): error getting ClusterInformation: connection is unauthorized: Unauthorized Warning FailedCreatePodSandBox 62m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "a555a64631dab504aeacecd828e512b84a5396f0c779b42a1398518740c858d0": plugin type="calico" failed (add): error getting ClusterInformation: connection is unauthorized: Unauthorized Warning FailedCreatePodSandBox 62m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "a204a8cd54ce4aa97875269c3475c48266de84a214633ee9eaca8b505df52735": plugin type="calico" failed (add): error getting ClusterInformation: connection is unauthorized: Unauthorized Warning FailedCreatePodSandBox 24m (x175 over 62m) kubelet (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "269a0272273b83edeb22c573c3bceeeb40d48bef4cafd0b91da1aa6617b1f3d4": plugin type="calico" failed (add): error getting ClusterInformation: connection is unauthorized: Unauthorized Warning NetworkNotReady 19m (x55 over 21m) kubelet network is not ready: container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized Warning NetworkNotReady 16m (x5 over 16m) kubelet network is not ready: container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized Warning FailedMount 16m (x4 over 16m) kubelet MountVolume.SetUp failed for volume "kube-api-access-kvk6g" : object "kubesphere-logging-system"/"kube-root-ca.crt" not registered Warning FailedMount 16m (x5 over 16m) kubelet MountVolume.SetUp failed for volume "config-volume" : object "kubesphere-logging-system"/"elasticsearch-logging-curator-elasticsearch-curator-config" not registered Normal Pulling 16m kubelet Pulling image "registry.cn-beijing.aliyuncs.com/kubesphereio/elasticsearch-curator:v5.7.6" Normal Pulled 13m kubelet Successfully pulled image "registry.cn-beijing.aliyuncs.com/kubesphereio/elasticsearch-curator:v5.7.6" in 3m3.099253003s Normal Created 13m kubelet Created container elasticsearch-curator Normal Started 13m kubelet Started container elasticsearch-curator
可以看出这个pod失败了很久之后,终于成功从registry.cn-beijing.aliyuncs.com/kubesphereio/elasticsearch-curator:v5.7.6
拉取到镜像,并且创建及启动了容器elasticsearch-curator
。之后其完成了历史使命,正常退出。
至此,KubeSphere已经成功启动了服务网格Istio
。
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。