赞
踩
通过helm的方式安装kafka集群(有持久化),kafka并无官方helm,我们使用binata的版本。
helm链接:kafka 29.2.0 · bitnami/bitnami
打开链接install中可以查看安装命令,default values中有默认值。
1.添加helm仓库地址,myrepo是本地helm仓库名称,可以自行定义
helm repo add myrepo https://charts.bitnami.com/bitnami
2.查看helm仓库列表
helm repo list
3.更新helm仓库
helm repo update myrepo
4.查看kafka的Chart包的历史版本
helm search repo bitnami/kafka -l
5.手动在helm链接中的Default Values下载对应版本的默认值,然后本地修改后上传服务器(或者下载chart包修改默认value值)
这是我修改的一些默认值
(1)客户端需不需要认证的配置在这里,因为我是全内网使用为了简单,配置了PLAINTEXT,大家可以自行决定。
(2)控制器Controller监听认证方式配置,为了简单,也配置了PLAINTEXT
(3)专门用于Kafka集群中Broker之间的通信的监听器,为了简单,也配置了PLAINTEXT
(4)外部监听器,为了简单,也配置了PLAINTEXT
(5)配置kafka数据持久化,需要提前在K8S集群中创建一个存储类并设定size
(6)开启日志持久化、指定存储类及size
(7)开启jmx exporter用于监控kafka
(8)values文件修改完成后上传到可以执行kubeconfig及helm的服务器上,执行命令安装kafka
helm install kafka-cluster myrepo/kafka --version 29.2.0 -f helm-29.2.0-kafka-3.7.0.yaml --kubeconfig=/var/lib/jenkins/.kube/kubeconfig -n kafka-cluster
(9)部署webui方便查看
github地址:GitHub - provectus/kafka-ui: Open-Source Web UI for Apache Kafka Management
K8S集群kafka-cluster命名空间下创建一个无状态应用kafka-ui
镜像地址:provectuslabs/kafka-ui:latest
yaml示例:
- apiVersion: apps/v1
- kind: Deployment
- metadata:
- name: kafka-ui
- labels:
- app: kafka-ui
- namespace: kafka
- spec:
- replicas: 1
- selector:
- matchLabels:
- app: kafka-ui
- template:
- metadata:
- labels:
- app: kafka-ui
- spec:
- containers:
- - name: kafka-ui
- image: provectuslabs/kafka-ui:latest
- env:
- - name: KAFKA_CLUSTERS_0_NAME
- value: 'Kafka Cluster'
- - name: KAFKA_CLUSTERS_0_BOOTSTRAPSERVERS
- value: 'kafka-controller-0.kafka-controller-headless.kafka.svc.cluster.local:9092,kafka-controller-1.kafka-controller-headless.kafka.svc.cluster.local:9092,kafka-controller-2.kafka-controller-headless.kafka.svc.cluster.local:9092'
- - name: KAFKA_CLUSTERS_0_PROPERTIES_SECURITY_PROTOCOL
- value: 'PLAINTEXT'
- - name: AUTH_TYPE
- value: 'LOGIN_FORM'
- - name: SPRING_SECURITY_USER_NAME
- value: 'devops'
- - name: SPRING_SECURITY_USER_PASSWORD
- value: 'mfniqJkDk'
- resources:
- requests:
- memory: "256Mi"
- cpu: "100m"
- ports:
- - containerPort: 8080
- ---
- apiVersion: v1
- kind: Service
- metadata:
- name: kafka-ui
- namespace: kafka
- spec:
- selector:
- app: kafka-ui
- type: NodePort
- ports:
- - protocol: TCP
- port: 8080
- targetPort: 8080
![](https://csdnimg.cn/release/blogv2/dist/pc/img/newCodeMoreWhite.png)
(10)解析一个域名到kafka-ui,这样就能通过web界面查看kafka的相关信息了。
(11)kafka安装好了之后,下面进行监控和告警。--(K8S集群helm安装Prometheus、alertmanage、Grafana大家自行搜索文档,这里不赘述)
进入监控部署的命名空间,找到保密字典中的这一项,编辑添加kafka-exporter的job。(前面helm安装的是jmx_exporter,我后边自行安装了kafka-exporter,大家根据自行需要安装,两个exporter可以同时存在)
- job_name: kafka_cluster_exporter
metrics_path: /metrics
static_configs:
- targets:
- 172.22.6.6:9308(这里改成自己的kafka-exporter服务地址和端口)
至此prometheus就已经开始通过kafa-exporter收集监控指标了。
(12)配置kafka告警项目并通过钉钉告警,为了减少告警项目,我打了自定义标签,并指定只有kafka的告警通过钉钉来发送,避免K8S集群告警大量告警信息的袭扰。
(1)首先在钉钉群里配置钉钉机器人获取到token及secret
alertmanager 的 receive 并不直接支持钉钉的 url,要部署插件容器 prometheus-webhook-dingtalk
并且有个需要注意的地方是,当 receives 为钉钉时 (webhook_configs),它的告警模板不是在 alertmanager 的配置文件中指定的,而是在钉钉插件 prometheus-webhook-dingtalk 中指定的。
编写 prometheus-webhook-dingtalk 配置文件和模板
vim prometheus-webhook-dingtalk-config.yaml,这里记的替换你的钉钉 url token。
- apiVersion: v1
- kind: ConfigMap
- metadata:
- name: prometheus-webhook-dingtalk-config
- namespace: monitoring
- data:
- config.yml: |-
- templates:
- - /etc/prometheus-webhook-dingtalk/default.tmpl
- targets:
- webhook1:
- url: https://oapi.dingtalk.com/robot/send?access_token=1f315a3d3b68ae9a5df0f6cde411902c493a10bc3d6ed6bbba8cd8b4bcd1c848
- secret: SEC4d160d1d987b58a19e9a825b83715b253d0b6d0c255b5abb28c265798c535b7e
- message:
- text: '{{ template "default.tmpl" . }}'
-
- default.tmpl: |
- {{ define "default.tmpl" }}
-
- {{- if gt (len .Alerts.Firing) 0 -}}
- {{- range $index, $alert := .Alerts -}}
-
- ============ = **<font color='#FF0000'>告警</font>** = =============
-
- **告警名称:** {{ $alert.Labels.alertname }}
- **告警级别:** {{ $alert.Labels.severity }} 级
- **告警状态:** {{ .Status }}
- **告警实例:** {{ $alert.Labels.instance }} {{ $alert.Labels.device }}
- **告警概要:** {{ .Annotations.summary }}
- **告警详情:** {{ $alert.Annotations.message }}{{ $alert.Annotations.description}}
- **故障时间:** {{ ($alert.StartsAt.Add 28800e9).Format "2006-01-02 15:04:05" }}
- ============ = end = =============
- {{- end }}
- {{- end }}
-
- {{- if gt (len .Alerts.Resolved) 0 -}}
- {{- range $index, $alert := .Alerts -}}
-
- ============ = <font color='#00FF00'>恢复</font> = =============
-
- **告警实例:** {{ .Labels.instance }}
- **告警名称:** {{ .Labels.alertname }}
- **告警级别:** {{ $alert.Labels.severity }} 级
- **告警状态:** {{ .Status }}
- **告警概要:** {{ $alert.Annotations.summary }}
- **告警详情:** {{ $alert.Annotations.message }}{{ $alert.Annotations.description}}
- **故障时间:** {{ ($alert.StartsAt.Add 28800e9).Format "2006-01-02 15:04:05" }}
- **恢复时间:** {{ ($alert.EndsAt.Add 28800e9).Format "2006-01-02 15:04:05" }}
-
- ============ = **end** = =============
- {{- end }}
- {{- end }}
- {{- end }}
![](https://csdnimg.cn/release/blogv2/dist/pc/img/newCodeMoreWhite.png)
(2)部署Prometheus-dingtalk-webhook服务,如果你helm安装prometheus的时候装过了,那就直接把上面的配置文件挂载进去就好了。
vim dingtalk-webhook-deploy.yaml
- apiVersion: v1
- kind: Service
- metadata:
- name: dingtalk
- namespace: monitoring
- labels:
- app: dingtalk
- spec:
- selector:
- app: dingtalk
- ports:
- - name: dingtalk
- port: 8060
- protocol: TCP
- targetPort: 8060
-
- ---
- apiVersion: apps/v1
- kind: Deployment
- metadata:
- name: dingtalk
- namespace: monitoring
- spec:
- replicas: 1
- selector:
- matchLabels:
- app: dingtalk
- template:
- metadata:
- name: dingtalk
- labels:
- app: dingtalk
- spec:
- containers:
- - name: dingtalk
- image: timonwong/prometheus-webhook-dingtalk:v2.1.0
- imagePullPolicy: IfNotPresent
- ports:
- - containerPort: 8060
- volumeMounts:
- - name: config
- mountPath: /etc/prometheus-webhook-dingtalk
- volumes:
- - name: config
- configMap:
- name: prometheus-webhook-dingtalk-config
![](https://csdnimg.cn/release/blogv2/dist/pc/img/newCodeMoreWhite.png)
kubectl -n monitoring apply -f dingtalk-webhook-deploy.yaml
(3)通过yaml创建自定义报警规则文件资源。
- apiVersion: monitoring.coreos.com/v1
- kind: PrometheusRule
- metadata:
- annotations:
- prometheus-operator-validated: 'true'
- creationTimestamp: '2024-06-07T06:44:49Z'
- generation: 12
- labels:
- app: ack-prometheus-operator
- release: ack-prometheus-operator
- managedFields:
- - apiVersion: monitoring.coreos.com/v1
- fieldsType: FieldsV1
- fieldsV1:
- 'f:metadata':
- 'f:labels':
- .: {}
- 'f:app': {}
- 'f:release': {}
- 'f:spec':
- .: {}
- 'f:groups': {}
- manager: okhttp
- operation: Update
- time: '2024-06-27T08:45:55Z'
- name: ack-prometheus-operator-kafka.rules
- namespace: monitoring
- resourceVersion: '263680376'
- uid: 4f574388-b8e7-493e-80cb-be9f73a14c5f
- spec:
- groups:
- - name: kafka-cluster-exporter
- rules:
- - alert: KafkaClusterExporterDown
- annotations:
- description: Kafka Cluster Exporter停止运行1分钟.
- summary: Kafka Cluster Exporter已经停止
- expr: 'up{job="kafka_cluster_exporter"} == 0'
- for: 1m
- labels:
- product: kafka-cluster
- severity: critical
- status: 严重
- - name: kafka消费滞后告警
- rules:
- - alert: kafka消费滞后
- annotations:
- description: >-
- {{$.Labels.consumergroup}}##{{$.Labels.topic}}:消费滞后超过500持续3分钟(当前:{{$value}})
- summary: kafka消费滞后
- expr: >-
- sum(kafka_consumergroup_lag{topic!="sop_free_study_fix-student_wechat_detail"})
- by (consumergroup, topic) > 500
- for: 3m
- labels:
- product: kafka-cluster
- serverity: warning
- status: 严重
- - alert: jshop cluster kafka down
- annotations:
- description: 'kafka-cluster-broker down }'
- summary: jshop-cluster-broker数量小于3
- expr: 'kafka_brokers{job="kafka_cluster_exporter"} < 3'
- for: 1m
- labels:
- product: kafka-cluster
- serverity: warning
- status: 严重
![](https://csdnimg.cn/release/blogv2/dist/pc/img/newCodeMoreWhite.png)
kubectl -n monitoring create -f prometheus-kafka.yaml,这个yaml中告警我们添加了自定义标签product,值为kafka-cluster,方便alertmanager到时候过滤。
(4)配置alertmanager规则。
注意下面webhook_configs的地址改为自己部署的prometheus-webhook-dingtalk的服务地址。
- global:
- resolve_timeout: 5m
- receivers:
- - name: 'null'
- - name: 'dingtalk'
- webhook_configs:
- - url: 'http://172.22.7.34:8060/dingtalk/webhook1/send'
- send_resolved: true
- route:
- group_by:
- - alertname
- group_interval: 5m
- group_wait: 30s
- receiver: "null"
- repeat_interval: 1h
- routes:
- - receiver: "dingtalk"
- match:
- product: 'kafka-cluster'
- inhibit_rules:
- - source_match:
- severity: 'critical'
- target_match:
- severity: 'warning'
- equal: ['alertname', 'kafka', 'instance']
![](https://csdnimg.cn/release/blogv2/dist/pc/img/newCodeMoreWhite.png)
篇幅过长,时间有限,有不正确之处,请指正。如有部署过程中的问题,请留言可以一起讨论。
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。