当前位置:   article > 正文

Kubernetes1.17.6下部署Prometheus+node-exporter+Grafana监控系统_kubesphere部署node_exporter

kubesphere部署node_exporter

一、Prometheus 持久化安装
我们prometheus采用nfs挂载方式来存储数据,同时使用configMap管理配置文件。并且我们将所有的prometheus存储在kube-system
1.1、建议将所有的prometheus yaml文件存在一块

mkdir /app/prometheus -p && cd /app/prometheus

1.2、生成配置文件

  1. cat >> prometheus.configmap.yaml <<EOF
  2. apiVersion: v1
  3. kind: ConfigMap
  4. metadata:
  5. name: prometheus-config
  6. namespace: kube-system
  7. data:
  8. prometheus.yml: |
  9. global:
  10. scrape_interval: 15s
  11. scrape_timeout: 15s
  12. scrape_configs:
  13. - job_name: 'prometheus'
  14. static_configs:
  15. - targets: ['localhost:9090']
  16. EOF

1.3、创建资源

  1. [root@k8s-node1 prometheus]# kubectl apply -f prometheus.configmap.yaml
  2. configmap/prometheus-config created
  3. [root@k8s-node1 prometheus]# kubectl get configmaps -n kube-system |grep prometheus
  4. prometheus-config 1 27h

1.4、创建prometheus的Pod资源

  1. [root@k8s-01 prometheus]# cat > prometheus.deploy.yaml <<EOF
  2. apiVersion: apps/v1
  3. kind: Deployment
  4. metadata:
  5. name: prometheus
  6. namespace: kube-system
  7. labels:
  8. app: prometheus
  9. spec:
  10. selector:
  11. matchLabels:
  12. app: prometheus
  13. template:
  14. metadata:
  15. labels:
  16. app: prometheus
  17. spec:
  18. serviceAccountName: prometheus
  19. containers:
  20. - image: prom/prometheus:v2.4.3
  21. name: prometheus
  22. command:
  23. - "/bin/prometheus"
  24. args:
  25. - "--config.file=/etc/prometheus/prometheus.yml"
  26. - "--storage.tsdb.path=/prometheus"
  27. - "--storage.tsdb.retention=30d"
  28. - "--web.enable-admin-api" # 控制对admin HTTP API的访问,其中包括删除时间序列等功能
  29. - "--web.enable-lifecycle" # 支持热更新,直接执行localhost:9090/-/reload立即生效
  30. ports:
  31. - containerPort: 9090
  32. protocol: TCP
  33. name: http
  34. volumeMounts:
  35. - mountPath: "/prometheus"
  36. subPath: prometheus
  37. name: data
  38. - mountPath: "/etc/prometheus"
  39. name: config-volume
  40. resources:
  41. requests:
  42. cpu: 100m
  43. memory: 512Mi
  44. limits:
  45. cpu: 100m
  46. memory: 512Mi
  47. securityContext:
  48. runAsUser: 0
  49. volumes:
  50. - name: data
  51. persistentVolumeClaim:
  52. claimName: prometheus
  53. - configMap:
  54. name: prometheus-config
  55. name: config-volume
  56. ---
  57. apiVersion: v1
  58. kind: Service
  59. metadata:
  60. namespace: kube-system
  61. name: prometheus
  62. labels:
  63. app: prometheus
  64. spec:
  65. type: NodePort
  66. selector:
  67. app: prometheus
  68. ports:
  69. - name: http
  70. port: 9090
  71. EOF

我们在启动程序的时候,除了指定prometheus.yaml(configmap)以外,还通过storage.tsdb.path指定了TSDB数据的存储路径、通过storage.tsdb.rentention设置了保留多长时间的数据,还有下面的web.enable-admin-api参数可以用来开启对admin api的访问权限,参数web.enable-lifecyle用来开启支持热更新,有了这个参数之后,prometheus.yaml(configmap)文件只要更新了,通过执行localhost:9090/-/reload就会立即生效

我们添加了一行securityContext,,其中runAsUser设置为0,这是因为prometheus运行过程中使用的用户是nobody,如果不配置可能会出现权限问题
1.5、nfs服务部署

[root@k8s-node1 prometheus]# yum install nfs-utils rpcbind -y

1.6、NFS服务器操作如下

  1. [root@k8s-node1 prometheus]# mkdir -p /app/k8s
  2. [root@k8s-node1 prometheus]# systemctl start rpcbind
  3. [root@k8s-node1 prometheus]# systemctl enable rpcbind
  4. [root@k8s-node1 prometheus]# systemctl enable nfs
  5. [root@k8s-node1 prometheus]# echo "/app/k8s 192.168.29.175(rw,no_root_squash,sync)" >>/etc/exports
  6. [root@k8s-node1 prometheus]# exportfs -r #使配置生效

1.7、创建prometheus-volume文件

  1. [root@k8s-node1 prometheus]# cat >prometheus-volume.yaml <<EOF
  2. apiVersion: v1
  3. kind: PersistentVolume
  4. metadata:
  5. name: prometheus
  6. spec:
  7. capacity:
  8. storage: 10Gi
  9. accessModes:
  10. - ReadWriteOnce
  11. persistentVolumeReclaimPolicy: Recycle
  12. nfs:
  13. server: 192.168.29.175
  14. path: /app/k8s
  15. ---
  16. apiVersion: v1
  17. kind: PersistentVolumeClaim
  18. metadata:
  19. name: prometheus
  20. namespace: kube-system
  21. spec:
  22. accessModes:
  23. - ReadWriteOnce
  24. resources:
  25. requests:
  26. storage: 10Gi
  27. EOF

这里通过一个简单的NFS作为存储后端创建一个pv & pvc

  1. [root@k8s-node1 prometheus]# kubectl create -f prometheus-volume.yaml
  2. persistentvolume/prometheus created
  3. persistentvolumeclaim/prometheus created

1.8、创建rbac认证,因为prometheus需要访问k8s集群内部的资源

  1. [root@k8s-node1 prometheus]# cat >>prometheus-rbac.yaml <<EOF
  2. apiVersion: v1
  3. kind: ServiceAccount
  4. metadata:
  5. name: prometheus
  6. namespace: kube-system
  7. ---
  8. apiVersion: rbac.authorization.k8s.io/v1
  9. kind: ClusterRole
  10. metadata:
  11. name: prometheus
  12. rules:
  13. - apiGroups:
  14. - ""
  15. resources:
  16. - nodes
  17. - services
  18. - endpoints
  19. - pods
  20. - nodes/proxy
  21. verbs:
  22. - get
  23. - list
  24. - watch
  25. - apiGroups:
  26. - ""
  27. resources:
  28. - configmaps
  29. - nodes/metrics
  30. verbs:
  31. - get
  32. - nonResourceURLs:
  33. - /metrics
  34. verbs:
  35. - get
  36. ---
  37. apiVersion: rbac.authorization.k8s.io/v1beta1
  38. kind: ClusterRoleBinding
  39. metadata:
  40. name: prometheus
  41. roleRef:
  42. apiGroup: rbac.authorization.k8s.io
  43. kind: ClusterRole
  44. name: prometheus
  45. subjects:
  46. - kind: ServiceAccount
  47. name: prometheus
  48. namespace: kube-system
  49. EOF

1.9、创建rbac文件

  1. [root@k8s-node1 prometheus]# kubectl create -f prometheus-rbac.yaml
  2. serviceaccount/prometheus created
  3. clusterrole.rbac.authorization.k8s.io/prometheus created
  4. clusterrolebinding.rbac.authorization.k8s.io/prometheus created

1.10、将ConfigMap volume rbac 创建完毕后,就可以创建prometheus_deploy

  1. [root@k8s-node1 prometheus]# kubectl create -f prometheus.deploy.yaml
  2. deployment.extensions/prometheus created

1.11、登录prometheus web界面查看(输入IP+NodePort)

å¨è¿éæå¥å¾çæè¿°

å¨è¿éæå¥å¾çæè¿°

二、监控集群节点
首先需要我们监控集群的节点,要监控节点其实我们已经有很多非常成熟的方案了,比如Nagios、Zabbix,甚至可以我们自己收集数据,这里我们通过prometheus来采集节点的监控指标,可以通过node_exporter获取,node_exporter就是抓取用于采集服务器节点的各种运行指标,目前node_exporter几乎支持所有常见的监控点,比如cpu、distats、loadavg、meminfo、netstat等,详细的监控列表可以参考github repo

这里使用DeamonSet控制器来部署该服务,这样每一个节点都会运行一个Pod,如果我们从集群中删除或添加节点后,也会进行自动扩展
 

  1. [root@k8s-node1 prometheus]# cat >>prometheus-node-exporter.yaml<<EOF
  2. apiVersion: apps/v1
  3. kind: DaemonSet
  4. metadata:
  5. name: node-exporter
  6. namespace: kube-system
  7. labels:
  8. name: node-exporter
  9. k8s-app: node-exporter
  10. spec:
  11. selector:
  12. matchLabels:
  13. name: node-exporter
  14. template:
  15. metadata:
  16. labels:
  17. name: node-exporter
  18. app: node-exporter
  19. spec:
  20. hostPID: true
  21. hostIPC: true
  22. hostNetwork: true
  23. containers:
  24. - name: node-exporter
  25. image: prom/node-exporter:v0.16.0
  26. ports:
  27. - containerPort: 9100
  28. resources:
  29. requests:
  30. cpu: 0.15
  31. securityContext:
  32. privileged: true
  33. args:
  34. - --path.procfs
  35. - /host/proc
  36. - --path.sysfs
  37. - /host/sys
  38. - --collector.filesystem.ignored-mount-points
  39. - '"^/(sys|proc|dev|host|etc)($|/)"'
  40. volumeMounts:
  41. - name: dev
  42. mountPath: /host/dev
  43. - name: proc
  44. mountPath: /host/proc
  45. - name: sys
  46. mountPath: /host/sys
  47. - name: rootfs
  48. mountPath: /rootfs
  49. tolerations:
  50. - key: "node-role.kubernetes.io/master"
  51. operator: "Exists"
  52. effect: "NoSchedule"
  53. volumes:
  54. - name: proc
  55. hostPath:
  56. path: /proc
  57. - name: dev
  58. hostPath:
  59. path: /dev
  60. - name: sys
  61. hostPath:
  62. path: /sys
  63. - name: rootfs
  64. hostPath:
  65. path: /
  66. EOF

2.1、创建node-exporter并检查pod

  1. [root@k8s-node1 prometheus]# kubectl create -f prometheus-node-exporter.yaml
  2. daemonset.extensions/node-exporter created
  3. [root@k8s-node1 prometheus]# kubectl get pod -n kube-system -o wide|grep node-exporter
  4. node-exporter-cmjkc 1/1 Running 0 33h 192.168.29.176 k8s-node2 <none> <none>
  5. node-exporter-wl5lx 1/1 Running 0 27h 192.168.29.182 k8s-node3 <none> <none>
  6. node-exporter-xsv9z 1/1 Running 0 33h 192.168.29.175 k8s-node1 <none> <none>

2.3、更新prometheus configmap文件

  1. [root@k8s-node1 prometheus]# cat prometheus.configmap.yaml
  2. apiVersion: v1
  3. kind: ConfigMap
  4. metadata:
  5. name: prometheus-config
  6. namespace: kube-system
  7. annotations:
  8. prometheus.io/port: "9153"
  9. prometheus.io/scrape: "true"
  10. data:
  11. prometheus.yml: |
  12. global:
  13. scrape_interval: 15s
  14. scrape_timeout: 15s
  15. scrape_configs:
  16. - job_name: 'prometheus'
  17. static_configs:
  18. - targets: ['localhost:9090']
  19. - job_name: 'kubernetes-node'
  20. kubernetes_sd_configs:
  21. - role: node
  22. relabel_configs:
  23. - source_labels: [__address__]
  24. regex: '(.*):10250'
  25. replacement: '${1}:9100'
  26. target_label: __address__
  27. action: replace
  28. - action: labelmap
  29. regex: __meta_kubernetes_node_label_(.+)
  30. - job_name: 'kubernetes-cadvisor'
  31. kubernetes_sd_configs:
  32. - role: node
  33. scheme: https
  34. tls_config:
  35. ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
  36. bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
  37. relabel_configs:
  38. - action: labelmap
  39. regex: __meta_kubernetes_node_label_(.+)
  40. - target_label: __address__
  41. replacement: kubernetes.default.svc:443
  42. - source_labels: [__meta_kubernetes_node_name]
  43. regex: (.+)
  44. target_label: __metrics_path__
  45. replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
  46. - job_name: kubernetes-apiservers
  47. kubernetes_sd_configs:
  48. - role: endpoints
  49. relabel_configs:
  50. - action: keep
  51. regex: default;kubernetes;https
  52. source_labels:
  53. - __meta_kubernetes_namespace
  54. - __meta_kubernetes_service_name
  55. - __meta_kubernetes_endpoint_port_name
  56. scheme: https
  57. tls_config:
  58. ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
  59. insecure_skip_verify: true
  60. bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
  61. - job_name: 'kubernetes-service-endpoints'
  62. kubernetes_sd_configs:
  63. - role: endpoints
  64. relabel_configs:
  65. - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
  66. action: keep
  67. regex: true
  68. - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
  69. action: replace
  70. target_label: __scheme__
  71. regex: (https?)
  72. - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
  73. action: replace
  74. target_label: __metrics_path__
  75. regex: (.+)
  76. - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
  77. action: replace
  78. target_label: __address__
  79. regex: ([^:]+)(?::\d+)?;(\d+)
  80. replacement: $1:$2
  81. - action: labelmap
  82. regex: __meta_kubernetes_service_label_(.+)
  83. - source_labels: [__meta_kubernetes_namespace]
  84. action: replace
  85. target_label: kubernetes_namespace
  86. - source_labels: [__meta_kubernetes_service_name]
  87. action: replace
  88. target_label: kubernetes_name

2.4、刷新配置

  1. [root@k8s-node1 prometheus]# kubectl apply -f prometheus.configmap.yaml
  2. configmap/prometheus-config unchanged
  3. [root@k8s-node1 prometheus]# curl -X POST http://10.0.0.178:9090/-/reload
  4. [root@k8s-node1 prometheus]# curl -X POST http://10.0.0.178:9090/-/reload
  5. [root@k8s-node1 prometheus]# curl -X POST http://10.0.0.178:9090/-/reload

2.5、查看状态

å¨è¿éæå¥å¾çæè¿°

三、Grafana 安装并监控k8s集群
3.1、使用deployment持久化安装grafana

  1. cat >>grafana_deployment.yaml <<EOF
  2. apiVersion: apps/v1
  3. kind: Deployment
  4. metadata:
  5. name: grafana
  6. namespace: kube-system
  7. labels:
  8. app: grafana
  9. k8s-app: grafana
  10. spec:
  11. selector:
  12. matchLabels:
  13. k8s-app: grafana
  14. app: grafana
  15. revisionHistoryLimit: 10
  16. template:
  17. metadata:
  18. labels:
  19. app: grafana
  20. k8s-app: grafana
  21. spec:
  22. containers:
  23. - name: grafana
  24. image: grafana/grafana:5.3.4
  25. imagePullPolicy: IfNotPresent
  26. ports:
  27. - containerPort: 3000
  28. name: grafana
  29. env:
  30. - name: GF_SECURITY_ADMIN_USER
  31. value: admin
  32. - name: GF_SECURITY_ADMIN_PASSWORD
  33. value: 12345@com
  34. readinessProbe:
  35. failureThreshold: 10
  36. httpGet:
  37. path: /api/health
  38. port: 3000
  39. scheme: HTTP
  40. initialDelaySeconds: 60
  41. periodSeconds: 10
  42. successThreshold: 1
  43. timeoutSeconds: 30
  44. livenessProbe:
  45. failureThreshold: 3
  46. httpGet:
  47. path: /api/health
  48. port: 3000
  49. scheme: HTTP
  50. periodSeconds: 10
  51. successThreshold: 1
  52. timeoutSeconds: 1
  53. resources:
  54. limits:
  55. cpu: 300m
  56. memory: 1024Mi
  57. requests:
  58. cpu: 300m
  59. memory: 1024Mi
  60. volumeMounts:
  61. - mountPath: /var/lib/grafana
  62. subPath: grafana
  63. name: storage
  64. securityContext:
  65. fsGroup: 472
  66. runAsUser: 472
  67. volumes:
  68. - name: storage
  69. persistentVolumeClaim:
  70. claimName: grafana
  71. EOF

这里使用了grafana 5.3.4的镜像,添加了监控检查、资源声明,比较重要的变量是GF_SECURITY_ADMIN_USER和GF_SECURITY_ADMIN_PASSWORD为grafana的账号和密码
由于grafana将dashboard、插件这些数据保留在/var/lib/grafana目录下,所以我们这里需要做持久化,同时要针对这个目录做挂载声明,由于5.3.4版本用户的userid和groupid都有所变化,所以这里添加了一个securityContext设置用户ID
3.2、添加一个pv和pvc用于绑定grafana
 

  1. cat >>grafana_volume.yaml <<EOF
  2. apiVersion: v1
  3. kind: PersistentVolume
  4. metadata:
  5. name: grafana
  6. spec:
  7. capacity:
  8. storage: 10Gi
  9. accessModes:
  10. - ReadWriteOnce
  11. persistentVolumeReclaimPolicy: Recycle
  12. nfs:
  13. server: 192.168.29.175
  14. path: /app/k8s
  15. ---
  16. apiVersion: v1
  17. kind: PersistentVolumeClaim
  18. metadata:
  19. name: grafana
  20. namespace: kube-system
  21. spec:
  22. accessModes:
  23. - ReadWriteOnce
  24. resources:
  25. requests:
  26. storage: 10Gi
  27. EOF

3.3、创建一个service,使用NodePort

  1. cat >>grafana_svc.yaml<<EOF
  2. apiVersion: v1
  3. kind: Service
  4. metadata:
  5. name: grafana
  6. namespace: kube-system
  7. labels:
  8. app: grafana
  9. spec:
  10. type: NodePort
  11. ports:
  12. - port: 3000
  13. selector:
  14. app: grafana
  15. EOF

由于5.1(可以选择5.1之前的docker镜像,可以避免此类错误)版本后groupid更改,同时我们将/var/lib/grafana挂载到pvc后,目录拥有者可能不是grafana用户,所以我们还需要添加一个Job用于授权目录

  1. cat > grafana_job.yaml <<EOF
  2. apiVersion: batch/v1
  3. kind: Job
  4. metadata:
  5. name: grafana-chown
  6. namespace: kube-system
  7. spec:
  8. template:
  9. spec:
  10. restartPolicy: Never
  11. containers:
  12. - name: grafana-chown
  13. command: ["chown", "-R", "472:472", "/var/lib/grafana"]
  14. image: busybox
  15. imagePullPolicy: IfNotPresent
  16. volumeMounts:
  17. - name: storage
  18. subPath: grafana
  19. mountPath: /var/lib/grafana
  20. volumes:
  21. - name: storage
  22. persistentVolumeClaim:
  23. claimName: grafana
  24. EOF
  25. 使用一个busybox镜像将/var/lib/grafana目录修改为权限472

3.4、创建pv和pvc (这里是需要安装顺序来创建)

  1. [root@k8s-node1 prometheus]# kubectl create -f grafana_volume.yaml
  2. persistentvolume/grafana created
  3. persistentvolumeclaim/grafana created
  4. [root@k8s-node1 prometheus]# kubectl create -f grafana_job.yaml
  5. job.batch/grafana-chown created
  6. [root@k8s-node1 prometheus]# kubectl apply -f grafana_deployment.yaml
  7. deployment.apps/grafana created
  8. [root@k8s-node1 prometheus]# kubectl create -f grafana_svc.yaml

3.5、查看创建结果

  1. [root@k8s-node1 prometheus]# kubectl get pod,svc -n kube-system |grep grafana
  2. pod/grafana-54f6755f88-5dwl7 1/1 Running 0 27h
  3. pod/grafana-chown-lcw2v 0/1 Completed 0 27h
  4. service/grafana NodePort 10.0.0.202 <none> 3000:9006/TCP 27h

3.6、使用NodeIP+NodePort访问grafana web界面

å¨è¿éæå¥å¾çæè¿°

3.7、第一次创建grafana需要添加数据源

å¨è¿éæå¥å¾çæè¿°

å¨è¿éæå¥å¾çæè¿°

å¨è¿éæå¥å¾çæè¿°

å¨è¿éæå¥å¾çæè¿°

数据源添加完毕后,接下来添加New dashboard

å¨è¿éæå¥å¾çæè¿°

grafana提供了很多模板,类似和docker镜像仓库一下。导入模板也极其简单。点击上方的Dashboard

å¨è¿éæå¥å¾çæè¿°

å¨è¿éæå¥å¾çæè¿°

å¨è¿éæå¥å¾çæè¿°

å¨è¿éæå¥å¾çæè¿°

å¨è¿éæå¥å¾çæè¿°

å¨è¿éæå¥å¾çæè¿°

å¨è¿éæå¥å¾çæè¿°

å¨è¿éæå¥å¾çæè¿°

sort_desc(sum (container_memory_usage_bytes{image!=" “, pod_name!=” “}) by(pod))

å¨è¿éæå¥å¾çæè¿°

sum by (pod)( rate(container_cpu_usage_seconds_total{pod_name=”$pod"}[1m] ) )

å¨è¿éæå¥å¾çæè¿°

å¨è¿éæå¥å¾çæè¿°

å¨è¿éæå¥å¾çæè¿°

容器监控模板:315 8588 3146 8685
主机监控模板:8919 9276 10467 10171 9965

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/我家自动化/article/detail/378666
推荐阅读
相关标签
  

闽ICP备14008679号