当前位置:   article > 正文

K8S集群安装kafka集群(持久化数据)并构建webui_helm kafka集群

helm kafka集群

通过helm的方式安装kafka集群(有持久化),kafka并无官方helm,我们使用binata的版本。

helm链接:kafka 29.2.0 · bitnami/bitnami

打开链接install中可以查看安装命令,default values中有默认值。

1.添加helm仓库地址,myrepo是本地helm仓库名称,可以自行定义

helm repo add myrepo  https://charts.bitnami.com/bitnami

2.查看helm仓库列表

helm repo list

3.更新helm仓库

helm repo update myrepo

4.查看kafka的Chart包的历史版本

helm search repo bitnami/kafka -l

 5.手动在helm链接中的Default Values下载对应版本的默认值,然后本地修改后上传服务器(或者下载chart包修改默认value值)

这是我修改的一些默认值

(1)客户端需不需要认证的配置在这里,因为我是全内网使用为了简单,配置了PLAINTEXT,大家可以自行决定。

(2)控制器Controller监听认证方式配置,为了简单,也配置了PLAINTEXT

(3)专门用于Kafka集群中Broker之间的通信的监听器,为了简单,也配置了PLAINTEXT

(4)外部监听器,为了简单,也配置了PLAINTEXT

(5)配置kafka数据持久化,需要提前在K8S集群中创建一个存储类并设定size

(6)开启日志持久化、指定存储类及size

(7)开启jmx exporter用于监控kafka

(8)values文件修改完成后上传到可以执行kubeconfig及helm的服务器上,执行命令安装kafka

helm install kafka-cluster myrepo/kafka --version 29.2.0 -f helm-29.2.0-kafka-3.7.0.yaml --kubeconfig=/var/lib/jenkins/.kube/kubeconfig -n kafka-cluster

(9)部署webui方便查看

github地址:GitHub - provectus/kafka-ui: Open-Source Web UI for Apache Kafka Management

K8S集群kafka-cluster命名空间下创建一个无状态应用kafka-ui

镜像地址:provectuslabs/kafka-ui:latest

yaml示例:

  1. apiVersion: apps/v1
  2. kind: Deployment
  3. metadata:
  4.   name: kafka-ui
  5.   labels:
  6.     app: kafka-ui
  7.   namespace: kafka    
  8. spec:
  9.   replicas: 1
  10.   selector:
  11.     matchLabels:
  12.       app: kafka-ui
  13.   template:
  14.     metadata:
  15.       labels:
  16.         app: kafka-ui
  17.     spec:
  18.       containers:
  19.       - name: kafka-ui
  20.         image: provectuslabs/kafka-ui:latest
  21.         env:
  22.         - name: KAFKA_CLUSTERS_0_NAME
  23.           value: 'Kafka Cluster'
  24.         - name: KAFKA_CLUSTERS_0_BOOTSTRAPSERVERS
  25.           value: 'kafka-controller-0.kafka-controller-headless.kafka.svc.cluster.local:9092,kafka-controller-1.kafka-controller-headless.kafka.svc.cluster.local:9092,kafka-controller-2.kafka-controller-headless.kafka.svc.cluster.local:9092'
  26.         - name: KAFKA_CLUSTERS_0_PROPERTIES_SECURITY_PROTOCOL
  27.           value: 'PLAINTEXT'
  28.         - name: AUTH_TYPE
  29.           value: 'LOGIN_FORM'
  30.         - name: SPRING_SECURITY_USER_NAME
  31.           value: 'devops'
  32.         - name: SPRING_SECURITY_USER_PASSWORD
  33.           value: 'mfniqJkDk'
  34.         resources:
  35.           requests:
  36.             memory: "256Mi"
  37.             cpu: "100m"
  38.         ports:
  39.         - containerPort: 8080
  40. ---
  41. apiVersion: v1
  42. kind: Service
  43. metadata:
  44.   name: kafka-ui
  45.   namespace: kafka     
  46. spec:
  47.   selector:
  48.     app: kafka-ui
  49.   type: NodePort
  50.   ports:
  51.     - protocol: TCP
  52.       port: 8080
  53.       targetPort: 8080

(10)解析一个域名到kafka-ui,这样就能通过web界面查看kafka的相关信息了。

(11)kafka安装好了之后,下面进行监控和告警。--(K8S集群helm安装Prometheus、alertmanage、Grafana大家自行搜索文档,这里不赘述)

进入监控部署的命名空间,找到保密字典中的这一项,编辑添加kafka-exporter的job。(前面helm安装的是jmx_exporter,我后边自行安装了kafka-exporter,大家根据自行需要安装,两个exporter可以同时存在)

- job_name: kafka_cluster_exporter
  metrics_path: /metrics
  static_configs:
  - targets:
    - 172.22.6.6:9308(这里改成自己的kafka-exporter服务地址和端口)

至此prometheus就已经开始通过kafa-exporter收集监控指标了。

(12)配置kafka告警项目并通过钉钉告警,为了减少告警项目,我打了自定义标签,并指定只有kafka的告警通过钉钉来发送,避免K8S集群告警大量告警信息的袭扰。

(1)首先在钉钉群里配置钉钉机器人获取到token及secret

alertmanager 的 receive 并不直接支持钉钉的 url,要部署插件容器 prometheus-webhook-dingtalk

并且有个需要注意的地方是,当 receives 为钉钉时 (webhook_configs),它的告警模板不是在 alertmanager 的配置文件中指定的,而是在钉钉插件 prometheus-webhook-dingtalk 中指定的。

编写 prometheus-webhook-dingtalk 配置文件和模板

vim prometheus-webhook-dingtalk-config.yaml,这里记的替换你的钉钉 url token。

  1. apiVersion: v1
  2. kind: ConfigMap
  3. metadata:
  4. name: prometheus-webhook-dingtalk-config
  5. namespace: monitoring
  6. data:
  7. config.yml: |-
  8. templates:
  9. - /etc/prometheus-webhook-dingtalk/default.tmpl
  10. targets:
  11. webhook1:
  12. url: https://oapi.dingtalk.com/robot/send?access_token=1f315a3d3b68ae9a5df0f6cde411902c493a10bc3d6ed6bbba8cd8b4bcd1c848
  13. secret: SEC4d160d1d987b58a19e9a825b83715b253d0b6d0c255b5abb28c265798c535b7e
  14. message:
  15. text: '{{ template "default.tmpl" . }}'
  16. default.tmpl: |
  17. {{ define "default.tmpl" }}
  18. {{- if gt (len .Alerts.Firing) 0 -}}
  19. {{- range $index, $alert := .Alerts -}}
  20. ============ = **<font color='#FF0000'>告警</font>** = =============
  21. **告警名称:** {{ $alert.Labels.alertname }}
  22. **告警级别:** {{ $alert.Labels.severity }} 级
  23. **告警状态:** {{ .Status }}
  24. **告警实例:** {{ $alert.Labels.instance }} {{ $alert.Labels.device }}
  25. **告警概要:** {{ .Annotations.summary }}
  26. **告警详情:** {{ $alert.Annotations.message }}{{ $alert.Annotations.description}}
  27. **故障时间:** {{ ($alert.StartsAt.Add 28800e9).Format "2006-01-02 15:04:05" }}
  28. ============ = end = =============
  29. {{- end }}
  30. {{- end }}
  31. {{- if gt (len .Alerts.Resolved) 0 -}}
  32. {{- range $index, $alert := .Alerts -}}
  33. ============ = <font color='#00FF00'>恢复</font> = =============
  34. **告警实例:** {{ .Labels.instance }}
  35. **告警名称:** {{ .Labels.alertname }}
  36. **告警级别:** {{ $alert.Labels.severity }} 级
  37. **告警状态:** {{ .Status }}
  38. **告警概要:** {{ $alert.Annotations.summary }}
  39. **告警详情:** {{ $alert.Annotations.message }}{{ $alert.Annotations.description}}
  40. **故障时间:** {{ ($alert.StartsAt.Add 28800e9).Format "2006-01-02 15:04:05" }}
  41. **恢复时间:** {{ ($alert.EndsAt.Add 28800e9).Format "2006-01-02 15:04:05" }}
  42. ============ = **end** = =============
  43. {{- end }}
  44. {{- end }}
  45. {{- end }}

(2)部署Prometheus-dingtalk-webhook服务,如果你helm安装prometheus的时候装过了,那就直接把上面的配置文件挂载进去就好了。

vim dingtalk-webhook-deploy.yaml

  1. apiVersion: v1
  2. kind: Service
  3. metadata:
  4. name: dingtalk
  5. namespace: monitoring
  6. labels:
  7. app: dingtalk
  8. spec:
  9. selector:
  10. app: dingtalk
  11. ports:
  12. - name: dingtalk
  13. port: 8060
  14. protocol: TCP
  15. targetPort: 8060
  16. ---
  17. apiVersion: apps/v1
  18. kind: Deployment
  19. metadata:
  20. name: dingtalk
  21. namespace: monitoring
  22. spec:
  23. replicas: 1
  24. selector:
  25. matchLabels:
  26. app: dingtalk
  27. template:
  28. metadata:
  29. name: dingtalk
  30. labels:
  31. app: dingtalk
  32. spec:
  33. containers:
  34. - name: dingtalk
  35. image: timonwong/prometheus-webhook-dingtalk:v2.1.0
  36. imagePullPolicy: IfNotPresent
  37. ports:
  38. - containerPort: 8060
  39. volumeMounts:
  40. - name: config
  41. mountPath: /etc/prometheus-webhook-dingtalk
  42. volumes:
  43. - name: config
  44. configMap:
  45. name: prometheus-webhook-dingtalk-config

kubectl -n monitoring  apply -f dingtalk-webhook-deploy.yaml

(3)通过yaml创建自定义报警规则文件资源。

  1. apiVersion: monitoring.coreos.com/v1
  2. kind: PrometheusRule
  3. metadata:
  4. annotations:
  5. prometheus-operator-validated: 'true'
  6. creationTimestamp: '2024-06-07T06:44:49Z'
  7. generation: 12
  8. labels:
  9. app: ack-prometheus-operator
  10. release: ack-prometheus-operator
  11. managedFields:
  12. - apiVersion: monitoring.coreos.com/v1
  13. fieldsType: FieldsV1
  14. fieldsV1:
  15. 'f:metadata':
  16. 'f:labels':
  17. .: {}
  18. 'f:app': {}
  19. 'f:release': {}
  20. 'f:spec':
  21. .: {}
  22. 'f:groups': {}
  23. manager: okhttp
  24. operation: Update
  25. time: '2024-06-27T08:45:55Z'
  26. name: ack-prometheus-operator-kafka.rules
  27. namespace: monitoring
  28. resourceVersion: '263680376'
  29. uid: 4f574388-b8e7-493e-80cb-be9f73a14c5f
  30. spec:
  31. groups:
  32. - name: kafka-cluster-exporter
  33. rules:
  34. - alert: KafkaClusterExporterDown
  35. annotations:
  36. description: Kafka Cluster Exporter停止运行1分钟.
  37. summary: Kafka Cluster Exporter已经停止
  38. expr: 'up{job="kafka_cluster_exporter"} == 0'
  39. for: 1m
  40. labels:
  41. product: kafka-cluster
  42. severity: critical
  43. status: 严重
  44. - name: kafka消费滞后告警
  45. rules:
  46. - alert: kafka消费滞后
  47. annotations:
  48. description: >-
  49. {{$.Labels.consumergroup}}##{{$.Labels.topic}}:消费滞后超过500持续3分钟(当前:{{$value}})
  50. summary: kafka消费滞后
  51. expr: >-
  52. sum(kafka_consumergroup_lag{topic!="sop_free_study_fix-student_wechat_detail"})
  53. by (consumergroup, topic) > 500
  54. for: 3m
  55. labels:
  56. product: kafka-cluster
  57. serverity: warning
  58. status: 严重
  59. - alert: jshop cluster kafka down
  60. annotations:
  61. description: 'kafka-cluster-broker down }'
  62. summary: jshop-cluster-broker数量小于3
  63. expr: 'kafka_brokers{job="kafka_cluster_exporter"} < 3'
  64. for: 1m
  65. labels:
  66. product: kafka-cluster
  67. serverity: warning
  68. status: 严重

kubectl   -n monitoring   create -f prometheus-kafka.yaml,这个yaml中告警我们添加了自定义标签product,值为kafka-cluster,方便alertmanager到时候过滤。

  (4)配置alertmanager规则。

注意下面webhook_configs的地址改为自己部署的prometheus-webhook-dingtalk的服务地址。

  1. global:
  2. resolve_timeout: 5m
  3. receivers:
  4. - name: 'null'
  5. - name: 'dingtalk'
  6. webhook_configs:
  7. - url: 'http://172.22.7.34:8060/dingtalk/webhook1/send'
  8. send_resolved: true
  9. route:
  10. group_by:
  11. - alertname
  12. group_interval: 5m
  13. group_wait: 30s
  14. receiver: "null"
  15. repeat_interval: 1h
  16. routes:
  17. - receiver: "dingtalk"
  18. match:
  19. product: 'kafka-cluster'
  20. inhibit_rules:
  21. - source_match:
  22. severity: 'critical'
  23. target_match:
  24. severity: 'warning'
  25. equal: ['alertname', 'kafka', 'instance']

篇幅过长,时间有限,有不正确之处,请指正。如有部署过程中的问题,请留言可以一起讨论。 

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/秋刀鱼在做梦/article/detail/912228
推荐阅读
相关标签
  

闽ICP备14008679号