赞
踩
一、Prometheus 持久化安装
我们prometheus采用nfs挂载方式来存储数据,同时使用configMap管理配置文件。并且我们将所有的prometheus存储在kube-system
1.1、建议将所有的prometheus yaml文件存在一块
mkdir /app/prometheus -p && cd /app/prometheus
1.2、生成配置文件
- cat >> prometheus.configmap.yaml <<EOF
- apiVersion: v1
- kind: ConfigMap
- metadata:
- name: prometheus-config
- namespace: kube-system
- data:
- prometheus.yml: |
- global:
- scrape_interval: 15s
- scrape_timeout: 15s
- scrape_configs:
- - job_name: 'prometheus'
- static_configs:
- - targets: ['localhost:9090']
- EOF
1.3、创建资源
- [root@k8s-node1 prometheus]# kubectl apply -f prometheus.configmap.yaml
- configmap/prometheus-config created
-
- [root@k8s-node1 prometheus]# kubectl get configmaps -n kube-system |grep prometheus
- prometheus-config 1 27h
-
1.4、创建prometheus的Pod资源
- [root@k8s-01 prometheus]# cat > prometheus.deploy.yaml <<EOF
- apiVersion: apps/v1
- kind: Deployment
- metadata:
- name: prometheus
- namespace: kube-system
- labels:
- app: prometheus
- spec:
- selector:
- matchLabels:
- app: prometheus
- template:
- metadata:
- labels:
- app: prometheus
- spec:
- serviceAccountName: prometheus
- containers:
- - image: prom/prometheus:v2.4.3
- name: prometheus
- command:
- - "/bin/prometheus"
- args:
- - "--config.file=/etc/prometheus/prometheus.yml"
- - "--storage.tsdb.path=/prometheus"
- - "--storage.tsdb.retention=30d"
- - "--web.enable-admin-api" # 控制对admin HTTP API的访问,其中包括删除时间序列等功能
- - "--web.enable-lifecycle" # 支持热更新,直接执行localhost:9090/-/reload立即生效
- ports:
- - containerPort: 9090
- protocol: TCP
- name: http
- volumeMounts:
- - mountPath: "/prometheus"
- subPath: prometheus
- name: data
- - mountPath: "/etc/prometheus"
- name: config-volume
- resources:
- requests:
- cpu: 100m
- memory: 512Mi
- limits:
- cpu: 100m
- memory: 512Mi
- securityContext:
- runAsUser: 0
- volumes:
- - name: data
- persistentVolumeClaim:
- claimName: prometheus
- - configMap:
- name: prometheus-config
- name: config-volume
-
- ---
- apiVersion: v1
- kind: Service
- metadata:
- namespace: kube-system
- name: prometheus
- labels:
- app: prometheus
- spec:
- type: NodePort
- selector:
- app: prometheus
- ports:
- - name: http
- port: 9090
-
- EOF
我们在启动程序的时候,除了指定prometheus.yaml(configmap)以外,还通过storage.tsdb.path指定了TSDB数据的存储路径、通过storage.tsdb.rentention设置了保留多长时间的数据,还有下面的web.enable-admin-api参数可以用来开启对admin api的访问权限,参数web.enable-lifecyle用来开启支持热更新,有了这个参数之后,prometheus.yaml(configmap)文件只要更新了,通过执行localhost:9090/-/reload就会立即生效
我们添加了一行securityContext,,其中runAsUser设置为0,这是因为prometheus运行过程中使用的用户是nobody,如果不配置可能会出现权限问题
1.5、nfs服务部署
[root@k8s-node1 prometheus]# yum install nfs-utils rpcbind -y
1.6、NFS服务器操作如下
- [root@k8s-node1 prometheus]# mkdir -p /app/k8s
- [root@k8s-node1 prometheus]# systemctl start rpcbind
- [root@k8s-node1 prometheus]# systemctl enable rpcbind
- [root@k8s-node1 prometheus]# systemctl enable nfs
- [root@k8s-node1 prometheus]# echo "/app/k8s 192.168.29.175(rw,no_root_squash,sync)" >>/etc/exports
- [root@k8s-node1 prometheus]# exportfs -r #使配置生效
1.7、创建prometheus-volume文件
- [root@k8s-node1 prometheus]# cat >prometheus-volume.yaml <<EOF
- apiVersion: v1
- kind: PersistentVolume
- metadata:
- name: prometheus
- spec:
- capacity:
- storage: 10Gi
- accessModes:
- - ReadWriteOnce
- persistentVolumeReclaimPolicy: Recycle
- nfs:
- server: 192.168.29.175
- path: /app/k8s
-
- ---
- apiVersion: v1
- kind: PersistentVolumeClaim
- metadata:
- name: prometheus
- namespace: kube-system
- spec:
- accessModes:
- - ReadWriteOnce
- resources:
- requests:
- storage: 10Gi
-
- EOF
这里通过一个简单的NFS作为存储后端创建一个pv & pvc
- [root@k8s-node1 prometheus]# kubectl create -f prometheus-volume.yaml
- persistentvolume/prometheus created
- persistentvolumeclaim/prometheus created
1.8、创建rbac认证,因为prometheus需要访问k8s集群内部的资源
- [root@k8s-node1 prometheus]# cat >>prometheus-rbac.yaml <<EOF
- apiVersion: v1
- kind: ServiceAccount
- metadata:
- name: prometheus
- namespace: kube-system
- ---
- apiVersion: rbac.authorization.k8s.io/v1
- kind: ClusterRole
- metadata:
- name: prometheus
- rules:
- - apiGroups:
- - ""
- resources:
- - nodes
- - services
- - endpoints
- - pods
- - nodes/proxy
- verbs:
- - get
- - list
- - watch
- - apiGroups:
- - ""
- resources:
- - configmaps
- - nodes/metrics
- verbs:
- - get
- - nonResourceURLs:
- - /metrics
- verbs:
- - get
- ---
- apiVersion: rbac.authorization.k8s.io/v1beta1
- kind: ClusterRoleBinding
- metadata:
- name: prometheus
- roleRef:
- apiGroup: rbac.authorization.k8s.io
- kind: ClusterRole
- name: prometheus
- subjects:
- - kind: ServiceAccount
- name: prometheus
- namespace: kube-system
- EOF
1.9、创建rbac文件
- [root@k8s-node1 prometheus]# kubectl create -f prometheus-rbac.yaml
- serviceaccount/prometheus created
- clusterrole.rbac.authorization.k8s.io/prometheus created
- clusterrolebinding.rbac.authorization.k8s.io/prometheus created
1.10、将ConfigMap volume rbac 创建完毕后,就可以创建prometheus_deploy
- [root@k8s-node1 prometheus]# kubectl create -f prometheus.deploy.yaml
- deployment.extensions/prometheus created
1.11、登录prometheus web界面查看(输入IP+NodePort)
二、监控集群节点
首先需要我们监控集群的节点,要监控节点其实我们已经有很多非常成熟的方案了,比如Nagios、Zabbix,甚至可以我们自己收集数据,这里我们通过prometheus来采集节点的监控指标,可以通过node_exporter获取,node_exporter就是抓取用于采集服务器节点的各种运行指标,目前node_exporter几乎支持所有常见的监控点,比如cpu、distats、loadavg、meminfo、netstat等,详细的监控列表可以参考github repo
这里使用DeamonSet控制器来部署该服务,这样每一个节点都会运行一个Pod,如果我们从集群中删除或添加节点后,也会进行自动扩展
- [root@k8s-node1 prometheus]# cat >>prometheus-node-exporter.yaml<<EOF
- apiVersion: apps/v1
- kind: DaemonSet
- metadata:
- name: node-exporter
- namespace: kube-system
- labels:
- name: node-exporter
- k8s-app: node-exporter
- spec:
- selector:
- matchLabels:
- name: node-exporter
- template:
- metadata:
- labels:
- name: node-exporter
- app: node-exporter
- spec:
- hostPID: true
- hostIPC: true
- hostNetwork: true
- containers:
- - name: node-exporter
- image: prom/node-exporter:v0.16.0
- ports:
- - containerPort: 9100
- resources:
- requests:
- cpu: 0.15
- securityContext:
- privileged: true
- args:
- - --path.procfs
- - /host/proc
- - --path.sysfs
- - /host/sys
- - --collector.filesystem.ignored-mount-points
- - '"^/(sys|proc|dev|host|etc)($|/)"'
- volumeMounts:
- - name: dev
- mountPath: /host/dev
- - name: proc
- mountPath: /host/proc
- - name: sys
- mountPath: /host/sys
- - name: rootfs
- mountPath: /rootfs
- tolerations:
- - key: "node-role.kubernetes.io/master"
- operator: "Exists"
- effect: "NoSchedule"
- volumes:
- - name: proc
- hostPath:
- path: /proc
- - name: dev
- hostPath:
- path: /dev
- - name: sys
- hostPath:
- path: /sys
- - name: rootfs
- hostPath:
- path: /
-
- EOF
2.1、创建node-exporter并检查pod
- [root@k8s-node1 prometheus]# kubectl create -f prometheus-node-exporter.yaml
- daemonset.extensions/node-exporter created
-
- [root@k8s-node1 prometheus]# kubectl get pod -n kube-system -o wide|grep node-exporter
- node-exporter-cmjkc 1/1 Running 0 33h 192.168.29.176 k8s-node2 <none> <none>
- node-exporter-wl5lx 1/1 Running 0 27h 192.168.29.182 k8s-node3 <none> <none>
- node-exporter-xsv9z 1/1 Running 0 33h 192.168.29.175 k8s-node1 <none> <none>
2.3、更新prometheus configmap文件
- [root@k8s-node1 prometheus]# cat prometheus.configmap.yaml
- apiVersion: v1
- kind: ConfigMap
- metadata:
- name: prometheus-config
- namespace: kube-system
- annotations:
- prometheus.io/port: "9153"
- prometheus.io/scrape: "true"
- data:
- prometheus.yml: |
- global:
- scrape_interval: 15s
- scrape_timeout: 15s
- scrape_configs:
- - job_name: 'prometheus'
- static_configs:
- - targets: ['localhost:9090']
- - job_name: 'kubernetes-node'
- kubernetes_sd_configs:
- - role: node
- relabel_configs:
- - source_labels: [__address__]
- regex: '(.*):10250'
- replacement: '${1}:9100'
- target_label: __address__
- action: replace
- - action: labelmap
- regex: __meta_kubernetes_node_label_(.+)
- - job_name: 'kubernetes-cadvisor'
- kubernetes_sd_configs:
- - role: node
- scheme: https
- tls_config:
- ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
- bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
- relabel_configs:
- - action: labelmap
- regex: __meta_kubernetes_node_label_(.+)
- - target_label: __address__
- replacement: kubernetes.default.svc:443
- - source_labels: [__meta_kubernetes_node_name]
- regex: (.+)
- target_label: __metrics_path__
- replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
- - job_name: kubernetes-apiservers
- kubernetes_sd_configs:
- - role: endpoints
- relabel_configs:
- - action: keep
- regex: default;kubernetes;https
- source_labels:
- - __meta_kubernetes_namespace
- - __meta_kubernetes_service_name
- - __meta_kubernetes_endpoint_port_name
- scheme: https
- tls_config:
- ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
- insecure_skip_verify: true
- bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
- - job_name: 'kubernetes-service-endpoints'
- kubernetes_sd_configs:
- - role: endpoints
- relabel_configs:
- - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
- action: keep
- regex: true
- - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
- action: replace
- target_label: __scheme__
- regex: (https?)
- - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
- action: replace
- target_label: __metrics_path__
- regex: (.+)
- - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
- action: replace
- target_label: __address__
- regex: ([^:]+)(?::\d+)?;(\d+)
- replacement: $1:$2
- - action: labelmap
- regex: __meta_kubernetes_service_label_(.+)
- - source_labels: [__meta_kubernetes_namespace]
- action: replace
- target_label: kubernetes_namespace
- - source_labels: [__meta_kubernetes_service_name]
- action: replace
- target_label: kubernetes_name
2.4、刷新配置
- [root@k8s-node1 prometheus]# kubectl apply -f prometheus.configmap.yaml
- configmap/prometheus-config unchanged
-
- [root@k8s-node1 prometheus]# curl -X POST http://10.0.0.178:9090/-/reload
- [root@k8s-node1 prometheus]# curl -X POST http://10.0.0.178:9090/-/reload
- [root@k8s-node1 prometheus]# curl -X POST http://10.0.0.178:9090/-/reload
2.5、查看状态
三、Grafana 安装并监控k8s集群
3.1、使用deployment持久化安装grafana
- cat >>grafana_deployment.yaml <<EOF
- apiVersion: apps/v1
- kind: Deployment
- metadata:
- name: grafana
- namespace: kube-system
- labels:
- app: grafana
- k8s-app: grafana
- spec:
- selector:
- matchLabels:
- k8s-app: grafana
- app: grafana
- revisionHistoryLimit: 10
- template:
- metadata:
- labels:
- app: grafana
- k8s-app: grafana
- spec:
- containers:
- - name: grafana
- image: grafana/grafana:5.3.4
- imagePullPolicy: IfNotPresent
- ports:
- - containerPort: 3000
- name: grafana
- env:
- - name: GF_SECURITY_ADMIN_USER
- value: admin
- - name: GF_SECURITY_ADMIN_PASSWORD
- value: 12345@com
- readinessProbe:
- failureThreshold: 10
- httpGet:
- path: /api/health
- port: 3000
- scheme: HTTP
- initialDelaySeconds: 60
- periodSeconds: 10
- successThreshold: 1
- timeoutSeconds: 30
- livenessProbe:
- failureThreshold: 3
- httpGet:
- path: /api/health
- port: 3000
- scheme: HTTP
- periodSeconds: 10
- successThreshold: 1
- timeoutSeconds: 1
- resources:
- limits:
- cpu: 300m
- memory: 1024Mi
- requests:
- cpu: 300m
- memory: 1024Mi
- volumeMounts:
- - mountPath: /var/lib/grafana
- subPath: grafana
- name: storage
- securityContext:
- fsGroup: 472
- runAsUser: 472
- volumes:
- - name: storage
- persistentVolumeClaim:
- claimName: grafana
- EOF
这里使用了grafana 5.3.4的镜像,添加了监控检查、资源声明,比较重要的变量是GF_SECURITY_ADMIN_USER和GF_SECURITY_ADMIN_PASSWORD为grafana的账号和密码
由于grafana将dashboard、插件这些数据保留在/var/lib/grafana目录下,所以我们这里需要做持久化,同时要针对这个目录做挂载声明,由于5.3.4版本用户的userid和groupid都有所变化,所以这里添加了一个securityContext设置用户ID
3.2、添加一个pv和pvc用于绑定grafana
- cat >>grafana_volume.yaml <<EOF
- apiVersion: v1
- kind: PersistentVolume
- metadata:
- name: grafana
- spec:
- capacity:
- storage: 10Gi
- accessModes:
- - ReadWriteOnce
- persistentVolumeReclaimPolicy: Recycle
- nfs:
- server: 192.168.29.175
- path: /app/k8s
- ---
- apiVersion: v1
- kind: PersistentVolumeClaim
- metadata:
- name: grafana
- namespace: kube-system
- spec:
- accessModes:
- - ReadWriteOnce
- resources:
- requests:
- storage: 10Gi
-
- EOF
3.3、创建一个service,使用NodePort
- cat >>grafana_svc.yaml<<EOF
- apiVersion: v1
- kind: Service
- metadata:
- name: grafana
- namespace: kube-system
- labels:
- app: grafana
- spec:
- type: NodePort
- ports:
- - port: 3000
- selector:
- app: grafana
- EOF
由于5.1(可以选择5.1之前的docker镜像,可以避免此类错误)版本后groupid更改,同时我们将/var/lib/grafana挂载到pvc后,目录拥有者可能不是grafana用户,所以我们还需要添加一个Job用于授权目录
- cat > grafana_job.yaml <<EOF
- apiVersion: batch/v1
- kind: Job
- metadata:
- name: grafana-chown
- namespace: kube-system
- spec:
- template:
- spec:
- restartPolicy: Never
- containers:
- - name: grafana-chown
- command: ["chown", "-R", "472:472", "/var/lib/grafana"]
- image: busybox
- imagePullPolicy: IfNotPresent
- volumeMounts:
- - name: storage
- subPath: grafana
- mountPath: /var/lib/grafana
- volumes:
- - name: storage
- persistentVolumeClaim:
- claimName: grafana
- EOF
- 使用一个busybox镜像将/var/lib/grafana目录修改为权限472
3.4、创建pv和pvc (这里是需要安装顺序来创建)
- [root@k8s-node1 prometheus]# kubectl create -f grafana_volume.yaml
- persistentvolume/grafana created
- persistentvolumeclaim/grafana created
- [root@k8s-node1 prometheus]# kubectl create -f grafana_job.yaml
- job.batch/grafana-chown created
- [root@k8s-node1 prometheus]# kubectl apply -f grafana_deployment.yaml
- deployment.apps/grafana created
- [root@k8s-node1 prometheus]# kubectl create -f grafana_svc.yaml
3.5、查看创建结果
- [root@k8s-node1 prometheus]# kubectl get pod,svc -n kube-system |grep grafana
- pod/grafana-54f6755f88-5dwl7 1/1 Running 0 27h
- pod/grafana-chown-lcw2v 0/1 Completed 0 27h
- service/grafana NodePort 10.0.0.202 <none> 3000:9006/TCP 27h
3.6、使用NodeIP+NodePort访问grafana web界面
3.7、第一次创建grafana需要添加数据源
数据源添加完毕后,接下来添加New dashboard
grafana提供了很多模板,类似和docker镜像仓库一下。导入模板也极其简单。点击上方的Dashboard
sort_desc(sum (container_memory_usage_bytes{image!=" “, pod_name!=” “}) by(pod))
sum by (pod)( rate(container_cpu_usage_seconds_total{pod_name=”$pod"}[1m] ) )
容器监控模板:315 8588 3146 8685
主机监控模板:8919 9276 10467 10171 9965
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。