赞
踩
环境说明:
1、kubekey 2.2.1
2、kubesphere 3.3.0
3、kubernetes 1.23.7
4、kube-prometheus-stack 35.0.0
官方文档https://kubesphere.com.cn/docs/v3.3/faq/observability/byop/
要使用您自己的 Prometheus 堆栈设置,请执行以下步骤:
1、卸载 KubeSphere 的自定义 Prometheus 堆栈
2、安装您自己的 Prometheus 堆栈
3、将 KubeSphere 自定义组件安装至您的 Prometheus 堆栈
4、更改 KubeSphere 的 monitoring endpoint
步骤1、卸载 KubeSphere 的自定义 Prometheus 堆栈
kubectl -n kubesphere-system exec $(kubectl get pod -n kubesphere-system -l app=ks-installer -o jsonpath='{.items[0].metadata.name}') -- kubectl delete -f /kubesphere/kubesphere/prometheus/alertmanager/ 2>/dev/null
kubectl -n kubesphere-system exec $(kubectl get pod -n kubesphere-system -l app=ks-installer -o jsonpath='{.items[0].metadata.name}') -- kubectl delete -f /kubesphere/kubesphere/prometheus/devops/ 2>/dev/null
kubectl -n kubesphere-system exec $(kubectl get pod -n kubesphere-system -l app=ks-installer -o jsonpath='{.items[0].metadata.name}') -- kubectl delete -f /kubesphere/kubesphere/prometheus/etcd/ 2>/dev/null
kubectl -n kubesphere-system exec $(kubectl get pod -n kubesphere-system -l app=ks-installer -o jsonpath='{.items[0].metadata.name}') -- kubectl delete -f /kubesphere/kubesphere/prometheus/grafana/ 2>/dev/null
kubectl -n kubesphere-system exec $(kubectl get pod -n kubesphere-system -l app=ks-installer -o jsonpath='{.items[0].metadata.name}') -- kubectl delete -f /kubesphere/kubesphere/prometheus/kube-state-metrics/ 2>/dev/null
kubectl -n kubesphere-system exec $(kubectl get pod -n kubesphere-system -l app=ks-installer -o jsonpath='{.items[0].metadata.name}') -- kubectl delete -f /kubesphere/kubesphere/prometheus/node-exporter/ 2>/dev/null
kubectl -n kubesphere-system exec $(kubectl get pod -n kubesphere-system -l app=ks-installer -o jsonpath='{.items[0].metadata.name}') -- kubectl delete -f /kubesphere/kubesphere/prometheus/upgrade/ 2>/dev/null
kubectl -n kubesphere-system exec $(kubectl get pod -n kubesphere-system -l app=ks-installer -o jsonpath='{.items[0].metadata.name}') -- kubectl delete -f /kubesphere/kubesphere/prometheus/prometheus-rules-v1.16\+.yaml 2>/dev/null
kubectl -n kubesphere-system exec $(kubectl get pod -n kubesphere-system -l app=ks-installer -o jsonpath='{.items[0].metadata.name}') -- kubectl delete -f /kubesphere/kubesphere/prometheus/prometheus-rules.yaml 2>/dev/null
kubectl -n kubesphere-system exec $(kubectl get pod -n kubesphere-system -l app=ks-installer -o jsonpath='{.items[0].metadata.name}') -- kubectl delete -f /kubesphere/kubesphere/prometheus/prometheus 2>/dev/null
kubectl delete deploy -n kubesphere-monitoring-system prometheus-operator
kubectl delete svc -n kubesphere-monitoring-system prometheus-operator
kubectl delete prometheusrules.monitoring.coreos.com -n kubesphere-monitoring-system prometheus-operator-rules prometheus-k8s-rules
kubectl delete servicemonitor -n kubesphere-monitoring-system coredns kube-apiserver kube-controller-manager kube-scheduler kubelet prometheus-operator
2、安装您自己的 Prometheus 堆栈
#添加 kubernetes-dashboard helm chart helm repo add prometheus-community https://prometheus-community.github.io/helm-charts # 更新下仓库 helm repo update #查询repo helm repo list #指定变量 pro=kube-prometheus-stack chart_version=35.0.0 mkdir -p /data/$pro cd /data/$pro #下载charts helm pull prometheus-community/$pro --version=$chart_version #提取values.yaml文件 tar zxvf $pro-$chart_version.tgz --strip-components 1 $pro/values.yaml cat > /data/$pro/start.sh << EOF helm upgrade --install --create-namespace $pro $pro-$chart_version.tgz \ -f values.yaml \ -n monitoring EOF
bash /data/kube-prometheus-stack/start.sh
KubeSphere 3.3.0 已经过认证,可以与以下 Prometheus 堆栈组件搭配使用:
Prometheus Operator v0.38.3+
Prometheus v2.20.1+
Alertmanager v0.21.0+
kube-state-metrics v1.9.6
node-exporter v0.18.1
请确保您的 Prometheus 堆栈组件版本符合上述版本要求,尤其是 node-exporter 和 kube-state-metrics。
问题解决:
1、对kube-proxy的监控,修改kube-proxy的configmap中的metricsBindAddress
kubectl edit cm -n kube-system kube-proxy
|
V
metricsBindAddress: "0.0.0.0:10249" #metrics的监控端口
2、对外部etcd的监控
cd /etc/ssl/etcd/ssl
cp admin-master01-key.pem etcd-client-key.pem #admin-lady-master01-key.pem根据实际情况进行改名
cp admin-master01.pem etcd-client.pem
kubectl create secret generic -n monitoring etcd-client-cert \
--from-file=etcd-ca=ca.pem \
--from-file=etcd-client=etcd-client.pem \
--from-file=etcd-client-key=etcd-client-key.pem
对kube-prometheus-stack的values.yaml进行修改
kubeEtcd: enabled: true endpoints: - 192.168.11.100 #外部etcd的IP地址 - 192.168.11.101 #外部etcd的IP地址 - 192.168.11.102 #外部etcd的IP地址 service: enabled: true port: 2379 targetPort: 2379 serviceMonitor: enabled: true interval: "" proxyUrl: "" scheme: https #使用https协议 insecureSkipVerify: true #不对证书进行验证 serverName: "localhost" caFile: /etc/prometheus/secrets/etcd-client-cert/etcd-ca #证书路径(pod内) certFile: /etc/prometheus/secrets/etcd-client-cert/etcd-client keyFile: /etc/prometheus/secrets/etcd-client-cert/etcd-client-key prometheus: prometheusSpec: secrets: - etcd-client-cert #在prometheus内增加证书的secret的挂载
3、将 Prometheus 规则评估间隔设置为 1m,与 KubeSphere 3.3.0 的自定义 ServiceMonitor 保持一致。规则评估间隔应大于或等于抓取间隔。
kubectl -n monitoring patch prometheuses.monitoring.coreos.com kube-prometheus-stack-prometheus --patch '{
"spec": {
"evaluationInterval": "1m"
}
}' --type=merge
4、将 monitoring endpoint 更改为您自己的 Prometheus:
kubectl edit cm -n kubesphere-system kubesphere-config #集群重启后会失效
monitoring:
endpoint: http://prometheus-operated.monitoring.svc:9090
kubectl edit cc -n kubesphere-system ks-installer #集群重启后不会失效
monitoring:
endpoint: http://prometheus-operated.monitoring.svc:9090
5、运行以下命令,重启 KubeSphere APIserver。
kubectl -n kubesphere-system rollout restart deployment/ks-apiserver
或者
kubectl rollout restart deploy -n kubesphere-system ks-installer
6、kubesphere的dashboard的图表出不来,记来要修改prometheusrules.monitoring.coreos.com,因为很多指标都是通过record来计算获取
#获取kube-promethues-stack与promethesrules和servicemonitor的关联label kubectl get prometheus -n monitoring kube-prometheus-stack-prometheus -o yaml ruleSelector: matchLabels: release: kube-prometheus-stack #使用此label与promethesrules关联 serviceMonitorSelector: matchLabels: release: kube-prometheus-stack #使用此label与servicemonitor关联 #下载kubernetes-prometheusRule.yaml,此与apiserver指标有关 wget https://raw.githubusercontent.com/kubesphere/ks-installer/master/roles/ks-monitor/files/prometheus/kubernetes/kubernetes-prometheusRule.yaml #修改kubernetes-prometheusRule.yaml vi kubernetes-prometheusRule.yaml apiVersion: monitoring.coreos.com/v1 kind: PrometheusRule metadata: labels: release: kube-prometheus-stack #修改labels部分,以让kube-promethues-stack读取到PrometheusRule中的规则 name: prometheus-k8s-rules namespace: monitoring #改为monitoring #应用kubernetes-prometheusRule.yaml kubectl apply -f kubernetes-prometheusRule.yaml
7、kube-prometheus-stack-node.rules会与kubernetes-prometheusRule.yaml中的’node_namespace_pod:kube_pod_info:'和node:node_num_cpu:sum冲突。
kubectl edit prometheusrules.monitoring.coreos.com -n monitoring kube-prometheus-stack-node.rules
#删除以下内容
- expr: |-
topk by(cluster, namespace, pod) (1,
max by (cluster, node, namespace, pod) (
label_replace(kube_pod_info{job="kube-state-metrics",node!=""}, "pod", "$1", "pod", "(.*)")
))
record: 'node_namespace_pod:kube_pod_info:'
- expr: |-
count by (cluster, node) (sum by (node, cpu) (
node_cpu_seconds_total{job="node-exporter"}
* on (namespace, pod) group_left(node)
topk by(namespace, pod) (1, node_namespace_pod:kube_pod_info:)
))
record: node:node_num_cpu:sum
8、验证–数据和图形都能正常显示
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。