当前位置:   article > 正文

kube-prometheus-stack 部署

kube-prometheus-stack

1. helm kube-prometheus-stack chart 下载

通过 helm 的方式,对 kube-prometheus-stack chart 服务的进行部署:

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
helm search repo  kube-prometheus-stack
helm pull prometheus/kube-prometheus-stack
tar xf kube-prometheus-stack-41.4.1.tgz && cd kube-prometheus-stack
  • 1
  • 2
  • 3
  • 4
  • 5

2. 修改 values.yaml 文件

在部署 Prometheus 之前,已进行以下准备:

  • 创建了一个名为 nfs-client 的 storageclass
  • 在 ingress-nginx 的名称空间,部署 ingress
## 编辑 values.yaml,对以下配置进行调整
alertmanager:
  ingress:
    enabled: true
    ingressClassName: nginx
    hosts:
    - alertmanager.local
    paths:
    - /
  alertmanagerSpec:
    retention: 720h
    storage: 
      volumeClaimTemplate:
        spec:
          storageClassName: nfs-client
          accessModes: ["ReadWriteOnce"]
          resources:
            requests:
              storage: 100Gi  	
---
grafana:
  adminPassword: 1qaz2wsx	
  ingress:
    enabled: true
    ingressClassName: nginx
	hosts: 
    - grafana.local
---
prometheus:
  ingress: 
    enabled: true
    ingressClassName: nginx
    hosts:
    - prometheus.local
	paths:
	- /
  prometheusSpes:
    retention: 360d
    storageSpec:
      volumeClaimTemplate:
        spec:
          storageClassName: nfs-client
          accessModes: ["ReadWriteOnce"]
          resources:
            requests:
              storage: 300Gi    
## 修改镜像的地址
prometheusOperator:
  admissionWebhooks
    patch:
      image:
        repository: registry.aliyuncs.com/google_containers/kube-webhook-certgen
---		
## charts/grafana/values.yaml
persistence:
  enabled: true
  storageClassName: nfs-client
  size: 100Gi
## chart/kube-state-metrics/values.yaml
## 修改镜像的地址
image:
  repository: bitnami/kube-state-metrics
  tag: 2.6.0
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
  • 56
  • 57
  • 58
  • 59
  • 60
  • 61
  • 62
  • 63

3. 部署

## 将服务部署到 monitoring 名称空间
kubectl create ns monitoring
helm install promethues . -n monitoring
## 检查是否正常
kubectl get all -n monitoring
NAME                                                         READY   STATUS    RESTARTS        AGE
pod/alertmanager-prometheus-kube-prometheus-alertmanager-0   2/2     Running   1 (2m37s ago)   102s
pod/prometheus-grafana-7c466d88c5-tq9zh                      3/3     Running   0               17m
pod/prometheus-kube-prometheus-operator-67b84b5d9b-z7cws     1/1     Running   0               17m
pod/prometheus-kube-state-metrics-77d5757f57-chrnx           1/1     Running   0               17m
pod/prometheus-prometheus-kube-prometheus-prometheus-0       2/2     Running   0               17m
pod/prometheus-prometheus-node-exporter-gj6rr                1/1     Running   0               17m
pod/prometheus-prometheus-node-exporter-rkl6q                1/1     Running   0               17m

NAME                                              TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                      AGE
service/alertmanager-operated                     ClusterIP   None             <none>        9093/TCP,9094/TCP,9094/UDP   17m
service/prometheus-grafana                        ClusterIP   172.24.140.186   <none>        80/TCP                       17m
service/prometheus-kube-prometheus-alertmanager   ClusterIP   172.24.60.136    <none>        9093/TCP                     17m
service/prometheus-kube-prometheus-operator       ClusterIP   172.24.106.230   <none>        443/TCP                      17m
service/prometheus-kube-prometheus-prometheus     ClusterIP   172.24.114.84    <none>        9090/TCP                     17m
service/prometheus-kube-state-metrics             ClusterIP   172.24.250.206   <none>        8080/TCP                     17m
service/prometheus-operated                       ClusterIP   None             <none>        9090/TCP                     17m
service/prometheus-prometheus-node-exporter       ClusterIP   172.24.74.178    <none>        9100/TCP                     17m

NAME                                                 DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
daemonset.apps/prometheus-prometheus-node-exporter   2         2         2       2            2           <none>          17m

NAME                                                  READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/prometheus-grafana                    1/1     1            1           17m
deployment.apps/prometheus-kube-prometheus-operator   1/1     1            1           17m
deployment.apps/prometheus-kube-state-metrics         1/1     1            1           17m

NAME                                                             DESIRED   CURRENT   READY   AGE
replicaset.apps/prometheus-grafana-7c466d88c5                    1         1         1       17m
replicaset.apps/prometheus-kube-prometheus-operator-67b84b5d9b   1         1         1       17m
replicaset.apps/prometheus-kube-state-metrics-77d5757f57         1         1         1       17m

NAME                                                                    READY   AGE
statefulset.apps/alertmanager-prometheus-kube-prometheus-alertmanager   1/1     17m
statefulset.apps/prometheus-prometheus-kube-prometheus-prometheus       1/1     17m
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40

报错处理:

报错1:**‘The CustomResourceDefinition “prometheuses.monitoring.coreos.com” is invalid: metadata.annotations: Too long: must have at most 262144 bytes’
处理**:cd ./kube-prometheus-stack/crds/ && kubectl create -f crd-prometheuses.yaml
报错2:‘failed calling webhook “prometheusrulemutate.monitoring.coreos.com”’
处理:应该是之前有装过不同版本的prometheus,在卸载后,相关 webhook 资源未完全删除。通过 kubectl get mutatingwebhookconfigurations 、 kubectl get validatingwebhookconfigurations 命令,查找报错的资源对象,删除即可:kubectl delete mutatingwebhookconfigurations.admissionregistration.k8s.io prometheus-kube-prometheus-admission,最后再更新部署。

4. 配置调整

访问 prometheus.local 时,点击Status-> Targets 页面,会发现 Prometheus 并不能正常获取一些组件的 metrices。对于 Kubernetes 的组件,大多情况可以通过 HTTP/HTTPS 访问组件的 /metrics 端点来获取组件的metrics,对于一些默认情况下不暴露端点的组件,可以使用 --bind-address 标志进行启用。
在这里插入图片描述

把 prometheus.local/alertmanager.local/grafana.local 本地解析更新到 hosts 文件中

4.1 kube-controller-manager

  • 修改配置
    kube-controller-manager 组件暴露 metrics 的端口是 10257,当访问时测试时,会报 “curl: (7) Failed connect to 10.49.18.103:10257; Connection refused ”的错误。结合kube-controller-manager 官网说明,调整组件 bind-address 参数的配置:
vim /etc/kubernetes/manifests/kube-controller-manager.yaml
  containers:
  - command:
    - kube-controller-manager
    - --allocate-node-cidrs=true
    - --authentication-kubeconfig=/etc/kubernetes/controller-manager.conf
    - --authorization-kubeconfig=/etc/kubernetes/controller-manager.conf
    ### 修改
    #- --bind-address=127.0.0.1
    - --bind-address=0.0.0.0
    ...省略...
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11

参考:https://kubernetes.io/docs/reference/command-line-tools-reference/kube-controller-manager/#options
–bind-address string Default: 0.0.0.0
The IP address on which to listen for the --secure-port port. The associated interface(s) must be reachable by the rest of the cluster, and by CLI/web clients. If blank or an unspecified address (0.0.0.0 or :: ), all interfaces will be used.

  • 验证
lsof -i:10257
COMMAND     PID USER   FD   TYPE   DEVICE SIZE/OFF NODE NAME
kube-cont 13515 root    7u  IPv6 38954551      0t0  TCP *:10257 (LISTEN)
kube-cont 13515 root   34u  IPv6 38968508      0t0  TCP master01.pl.hpc:10257->node1:61771 (ESTABLISHED)

### 10257 是安全端口,接收的是 https 请求
curl https://10.49.18.103:10257/metrics --cacert /etc/kubernetes/pki/ca.crt --cert /etc/kubernetes/pki/apiserver-kubelet-client.crt --key /etc/kubernetes/pki/apiserver-kubelet-client.key 
 sent to the audit backend.
# TYPE apiserver_audit_event_total counter
apiserver_audit_event_total 0
# HELP apiserver_audit_requests_rejected_total [ALPHA] Counter of apiserver requests rejected due to an error in audit logging backend.
# TYPE apiserver_audit_requests_rejected_total counter
apiserver_audit_requests_rejected_total 0
# HELP apiserver_client_certificate_expiration_seconds [ALPHA] Distribution of the remaining lifetime on the certificate used to authenticate a request.
# TYPE apiserver_client_certificate_expiration_seconds histogram
apiserver_client_certificate_expiration_seconds_bucket{le="0"} 0
apiserver_client_certificate_expiration_seconds_bucket{le="1800"} 0

...省略...
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19

4.2 kube-proxy 组件

  • 修改配置
kubectl edit cm -n kube-system kube-proxy
...省略...
    kind: KubeProxyConfiguration
    ### 修改
    # metricsBindAddress: ""
    metricsBindAddress: 0.0.0.0:10249
    mode: ipvs
...省略...
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8

备注:kube-proxy 配置是通过 configmap 的方式挂载到容器中,所以不要直接在 kube-proxy daemonset 中添加 metricBindAddress 参数,这种方式添加不会生效。

  • 验证
lsof -i:10249
COMMAND     PID USER   FD   TYPE   DEVICE SIZE/OFF NODE NAME
kube-prox 36749 root   13u  IPv6 39030529      0t0  TCP master01.pl.hpc:10249->node1:38685 (ESTABLISHED)
kube-prox 36749 root   14u  IPv6 39100619      0t0  TCP *:10249 (LISTEN)

curl 10.49.18.103:10249/metrics
# TYPE apiserver_audit_event_total counter
apiserver_audit_event_total 0
# HELP apiserver_audit_requests_rejected_total [ALPHA] Counter of apiserver requests rejected due to an error in
 audit logging backend.
# TYPE apiserver_audit_requests_rejected_total counter
apiserver_audit_requests_rejected_total 0
# HELP go_gc_cycles_automatic_gc_cycles_total Count of completed GC cycles generated by the Go runtime.
# TYPE go_gc_cycles_automatic_gc_cycles_total counter
go_gc_cycles_automatic_gc_cycles_total 13
# HELP go_gc_cycles_forced_gc_cycles_total Count of completed GC cycles forced by the application.
# TYPE go_gc_cycles_forced_gc_cycles_total counter
...省略...
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18

参考:https://kubernetes.io/docs/reference/command-line-tools-reference/kube-proxy/
–metrics-bind-address ipport Default: 127.0.0.1:10249
The IP address with port for the metrics server to serve on (set to ‘0.0.0.0:10249’ for all IPv4 interfaces and ‘[::]:10249’ for all IPv6 interfaces). Set empty to disable. This parameter is ignored if a config file is specified by --config.

4.3 kube-scheduler 组件

  • 修改配置
vim /etc/kubernetes/manifests/kube-controller-manager.yaml
...省略...
spec:
  containers:
  - command:
    - kube-controller-manager
    - --allocate-node-cidrs=true
    - --authentication-kubeconfig=/etc/kubernetes/controller-manager.conf
    - --authorization-kubeconfig=/etc/kubernetes/controller-manager.conf
   ### 修改 
   #- --bind-address=127.0.0.1
    - --bind-address=0.0.0.0
...省略...
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 验证
lsof -i:10259
COMMAND     PID USER   FD   TYPE   DEVICE SIZE/OFF NODE NAME
kube-sche 24404 root    7u  IPv6 38957456      0t0  TCP *:10259 (LISTEN)
kube-sche 24404 root   10u  IPv6 39009914      0t0  TCP master01.pl.hpc:10259->node1:29900 (ESTABLISHED)
### 10259 是安全端口,接收的是 https 请求
curl https://10.49.18.103:10259/metrics --cacert /etc/kubernetes/pki/ca.crt --cert /etc/kubernetes/pki/apiserver-kubelet-client.crt --key /etc/kubernetes/pki/apiserver-kubelet-client.key   --insecure
# TYPE apiserver_audit_event_total counter
apiserver_audit_event_total 0
# HELP apiserver_audit_requests_rejected_total [ALPHA] Counter of apiserver requests rejected due to an error in audit logging backend.
# TYPE apiserver_audit_requests_rejected_total counter
apiserver_audit_requests_rejected_total 0
# HELP apiserver_client_certificate_expiration_seconds [ALPHA] Distribution of the remaining lifetime on the certificate used to authenticate a request.
# TYPE apiserver_client_certificate_expiration_seconds histogram
apiserver_client_certificate_expiration_seconds_bucket{le="0"} 0
apiserver_client_certificate_expiration_seconds_bucket{le="1800"} 0
apiserver_client_certificate_expiration_seconds_bucket{le="3600"} 0
apiserver_client_certificate_expiration_seconds_bucket{le="7200"} 0
...省略...
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18

参考:https://kubernetes.io/docs/reference/command-line-tools-reference/kube-scheduler/
–bind-address string Default: 0.0.0.0
The IP address on which to listen for the --secure-port port. The associated interface(s) must be reachable by the rest of the cluster, and by CLI/web clients. If blank or an unspecified address (0.0.0.0 or :声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/你好赵伟/article/detail/804567

推荐阅读
相关标签