赞
踩
通过 helm 的方式,对 kube-prometheus-stack chart 服务的进行部署:
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
helm search repo kube-prometheus-stack
helm pull prometheus/kube-prometheus-stack
tar xf kube-prometheus-stack-41.4.1.tgz && cd kube-prometheus-stack
在部署 Prometheus 之前,已进行以下准备:
- 创建了一个名为 nfs-client 的 storageclass
- 在 ingress-nginx 的名称空间,部署 ingress
## 编辑 values.yaml,对以下配置进行调整 alertmanager: ingress: enabled: true ingressClassName: nginx hosts: - alertmanager.local paths: - / alertmanagerSpec: retention: 720h storage: volumeClaimTemplate: spec: storageClassName: nfs-client accessModes: ["ReadWriteOnce"] resources: requests: storage: 100Gi --- grafana: adminPassword: 1qaz2wsx ingress: enabled: true ingressClassName: nginx hosts: - grafana.local --- prometheus: ingress: enabled: true ingressClassName: nginx hosts: - prometheus.local paths: - / prometheusSpes: retention: 360d storageSpec: volumeClaimTemplate: spec: storageClassName: nfs-client accessModes: ["ReadWriteOnce"] resources: requests: storage: 300Gi ## 修改镜像的地址 prometheusOperator: admissionWebhooks patch: image: repository: registry.aliyuncs.com/google_containers/kube-webhook-certgen --- ## charts/grafana/values.yaml persistence: enabled: true storageClassName: nfs-client size: 100Gi ## chart/kube-state-metrics/values.yaml ## 修改镜像的地址 image: repository: bitnami/kube-state-metrics tag: 2.6.0
## 将服务部署到 monitoring 名称空间 kubectl create ns monitoring helm install promethues . -n monitoring ## 检查是否正常 kubectl get all -n monitoring NAME READY STATUS RESTARTS AGE pod/alertmanager-prometheus-kube-prometheus-alertmanager-0 2/2 Running 1 (2m37s ago) 102s pod/prometheus-grafana-7c466d88c5-tq9zh 3/3 Running 0 17m pod/prometheus-kube-prometheus-operator-67b84b5d9b-z7cws 1/1 Running 0 17m pod/prometheus-kube-state-metrics-77d5757f57-chrnx 1/1 Running 0 17m pod/prometheus-prometheus-kube-prometheus-prometheus-0 2/2 Running 0 17m pod/prometheus-prometheus-node-exporter-gj6rr 1/1 Running 0 17m pod/prometheus-prometheus-node-exporter-rkl6q 1/1 Running 0 17m NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/alertmanager-operated ClusterIP None <none> 9093/TCP,9094/TCP,9094/UDP 17m service/prometheus-grafana ClusterIP 172.24.140.186 <none> 80/TCP 17m service/prometheus-kube-prometheus-alertmanager ClusterIP 172.24.60.136 <none> 9093/TCP 17m service/prometheus-kube-prometheus-operator ClusterIP 172.24.106.230 <none> 443/TCP 17m service/prometheus-kube-prometheus-prometheus ClusterIP 172.24.114.84 <none> 9090/TCP 17m service/prometheus-kube-state-metrics ClusterIP 172.24.250.206 <none> 8080/TCP 17m service/prometheus-operated ClusterIP None <none> 9090/TCP 17m service/prometheus-prometheus-node-exporter ClusterIP 172.24.74.178 <none> 9100/TCP 17m NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE daemonset.apps/prometheus-prometheus-node-exporter 2 2 2 2 2 <none> 17m NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/prometheus-grafana 1/1 1 1 17m deployment.apps/prometheus-kube-prometheus-operator 1/1 1 1 17m deployment.apps/prometheus-kube-state-metrics 1/1 1 1 17m NAME DESIRED CURRENT READY AGE replicaset.apps/prometheus-grafana-7c466d88c5 1 1 1 17m replicaset.apps/prometheus-kube-prometheus-operator-67b84b5d9b 1 1 1 17m replicaset.apps/prometheus-kube-state-metrics-77d5757f57 1 1 1 17m NAME READY AGE statefulset.apps/alertmanager-prometheus-kube-prometheus-alertmanager 1/1 17m statefulset.apps/prometheus-prometheus-kube-prometheus-prometheus 1/1 17m
报错处理:
报错1:**‘The CustomResourceDefinition “prometheuses.monitoring.coreos.com” is invalid: metadata.annotations: Too long: must have at most 262144 bytes’
处理**:cd ./kube-prometheus-stack/crds/ && kubectl create -f crd-prometheuses.yaml
报错2:‘failed calling webhook “prometheusrulemutate.monitoring.coreos.com”’
处理:应该是之前有装过不同版本的prometheus,在卸载后,相关 webhook 资源未完全删除。通过 kubectl get mutatingwebhookconfigurations 、 kubectl get validatingwebhookconfigurations 命令,查找报错的资源对象,删除即可:kubectl delete mutatingwebhookconfigurations.admissionregistration.k8s.io prometheus-kube-prometheus-admission,最后再更新部署。
访问 prometheus.local 时,点击Status-> Targets 页面,会发现 Prometheus 并不能正常获取一些组件的 metrices。对于 Kubernetes 的组件,大多情况可以通过 HTTP/HTTPS 访问组件的 /metrics 端点来获取组件的metrics,对于一些默认情况下不暴露端点的组件,可以使用 --bind-address 标志进行启用。
把 prometheus.local/alertmanager.local/grafana.local 本地解析更新到 hosts 文件中
vim /etc/kubernetes/manifests/kube-controller-manager.yaml
containers:
- command:
- kube-controller-manager
- --allocate-node-cidrs=true
- --authentication-kubeconfig=/etc/kubernetes/controller-manager.conf
- --authorization-kubeconfig=/etc/kubernetes/controller-manager.conf
### 修改
#- --bind-address=127.0.0.1
- --bind-address=0.0.0.0
...省略...
参考:https://kubernetes.io/docs/reference/command-line-tools-reference/kube-controller-manager/#options
–bind-address string Default: 0.0.0.0
The IP address on which to listen for the --secure-port port. The associated interface(s) must be reachable by the rest of the cluster, and by CLI/web clients. If blank or an unspecified address (0.0.0.0 or :: ), all interfaces will be used.
lsof -i:10257 COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME kube-cont 13515 root 7u IPv6 38954551 0t0 TCP *:10257 (LISTEN) kube-cont 13515 root 34u IPv6 38968508 0t0 TCP master01.pl.hpc:10257->node1:61771 (ESTABLISHED) ### 10257 是安全端口,接收的是 https 请求 curl https://10.49.18.103:10257/metrics --cacert /etc/kubernetes/pki/ca.crt --cert /etc/kubernetes/pki/apiserver-kubelet-client.crt --key /etc/kubernetes/pki/apiserver-kubelet-client.key sent to the audit backend. # TYPE apiserver_audit_event_total counter apiserver_audit_event_total 0 # HELP apiserver_audit_requests_rejected_total [ALPHA] Counter of apiserver requests rejected due to an error in audit logging backend. # TYPE apiserver_audit_requests_rejected_total counter apiserver_audit_requests_rejected_total 0 # HELP apiserver_client_certificate_expiration_seconds [ALPHA] Distribution of the remaining lifetime on the certificate used to authenticate a request. # TYPE apiserver_client_certificate_expiration_seconds histogram apiserver_client_certificate_expiration_seconds_bucket{le="0"} 0 apiserver_client_certificate_expiration_seconds_bucket{le="1800"} 0 ...省略...
kubectl edit cm -n kube-system kube-proxy
...省略...
kind: KubeProxyConfiguration
### 修改
# metricsBindAddress: ""
metricsBindAddress: 0.0.0.0:10249
mode: ipvs
...省略...
备注:kube-proxy 配置是通过 configmap 的方式挂载到容器中,所以不要直接在 kube-proxy daemonset 中添加 metricBindAddress 参数,这种方式添加不会生效。
lsof -i:10249 COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME kube-prox 36749 root 13u IPv6 39030529 0t0 TCP master01.pl.hpc:10249->node1:38685 (ESTABLISHED) kube-prox 36749 root 14u IPv6 39100619 0t0 TCP *:10249 (LISTEN) curl 10.49.18.103:10249/metrics # TYPE apiserver_audit_event_total counter apiserver_audit_event_total 0 # HELP apiserver_audit_requests_rejected_total [ALPHA] Counter of apiserver requests rejected due to an error in audit logging backend. # TYPE apiserver_audit_requests_rejected_total counter apiserver_audit_requests_rejected_total 0 # HELP go_gc_cycles_automatic_gc_cycles_total Count of completed GC cycles generated by the Go runtime. # TYPE go_gc_cycles_automatic_gc_cycles_total counter go_gc_cycles_automatic_gc_cycles_total 13 # HELP go_gc_cycles_forced_gc_cycles_total Count of completed GC cycles forced by the application. # TYPE go_gc_cycles_forced_gc_cycles_total counter ...省略...
参考:https://kubernetes.io/docs/reference/command-line-tools-reference/kube-proxy/
–metrics-bind-address ipport Default: 127.0.0.1:10249
The IP address with port for the metrics server to serve on (set to ‘0.0.0.0:10249’ for all IPv4 interfaces and ‘[::]:10249’ for all IPv6 interfaces). Set empty to disable. This parameter is ignored if a config file is specified by --config.
vim /etc/kubernetes/manifests/kube-controller-manager.yaml
...省略...
spec:
containers:
- command:
- kube-controller-manager
- --allocate-node-cidrs=true
- --authentication-kubeconfig=/etc/kubernetes/controller-manager.conf
- --authorization-kubeconfig=/etc/kubernetes/controller-manager.conf
### 修改
#- --bind-address=127.0.0.1
- --bind-address=0.0.0.0
...省略...
lsof -i:10259 COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME kube-sche 24404 root 7u IPv6 38957456 0t0 TCP *:10259 (LISTEN) kube-sche 24404 root 10u IPv6 39009914 0t0 TCP master01.pl.hpc:10259->node1:29900 (ESTABLISHED) ### 10259 是安全端口,接收的是 https 请求 curl https://10.49.18.103:10259/metrics --cacert /etc/kubernetes/pki/ca.crt --cert /etc/kubernetes/pki/apiserver-kubelet-client.crt --key /etc/kubernetes/pki/apiserver-kubelet-client.key --insecure # TYPE apiserver_audit_event_total counter apiserver_audit_event_total 0 # HELP apiserver_audit_requests_rejected_total [ALPHA] Counter of apiserver requests rejected due to an error in audit logging backend. # TYPE apiserver_audit_requests_rejected_total counter apiserver_audit_requests_rejected_total 0 # HELP apiserver_client_certificate_expiration_seconds [ALPHA] Distribution of the remaining lifetime on the certificate used to authenticate a request. # TYPE apiserver_client_certificate_expiration_seconds histogram apiserver_client_certificate_expiration_seconds_bucket{le="0"} 0 apiserver_client_certificate_expiration_seconds_bucket{le="1800"} 0 apiserver_client_certificate_expiration_seconds_bucket{le="3600"} 0 apiserver_client_certificate_expiration_seconds_bucket{le="7200"} 0 ...省略...
参考:https://kubernetes.io/docs/reference/command-line-tools-reference/kube-scheduler/
–bind-address string Default: 0.0.0.0
The IP address on which to listen for the --secure-port port. The associated interface(s) must be reachable by the rest of the cluster, and by CLI/web clients. If blank or an unspecified address (0.0.0.0 or :声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/你好赵伟/article/detail/804567
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。