当前位置:   article > 正文

Hello Kafka(七)——Kafka集群监控

Hello Kafka(七)——Kafka集群监控

一、Kafka监控指标

1、Kafka主机监控指标

主机监控是监控Kafka集群Broker所在的节点机器的性能。常见的主机监控指标包括:

(1)机器负载(Load)

(2)CPU使用率

(3)内存使用率,包括空闲内存(Free Memory)和已使用内存(Used Memory)

(4)磁盘I/O使用率,包括读使用率和写使用率网络

(5)I/O使用率

(6)TCP连接数

(7)打开文件数

(8)inode使用情况

2、JVM监控指标

Kafka Broker进程是一个普通的Java进程,因此所有关于JVM的监控方式都可以用于对Kafka Broker进程的监控。

(1)Full GC发生频率和时长,用于评估Full GC对Broker进程的影响。长时间的停顿会令Broker端抛出各种超时异常。

(2)活跃对象大小,是设定堆大小的重要依据,能帮助细粒度地调优JVM各个代的堆大小。

(3)应用线程总数。了解Broker进程对CPU的使用情况。

2019-07-30T09:13:03.809+0800: 552.982: [GC cleanup 827M->645M(1024M), 0.0019078 secs] Broker JVM进程默认使用G1的GC算法,当cleanup步骤结束后,堆上活跃对象大小从827MB缩减成645MB。Kafka 0.9.0.0版本起,默认GC收集器为G1,而G1中的Full GC是由单线程执行的,速度非常慢。因此,需要监控Broker GC日志,即以kafkaServer-gc.log开头的文件。如果发现Broker进程频繁Full GC,可以开启G1的-XX:+PrintAdaptiveSizePolicy开关,让JVM指明是谁引发Full GC。

3、集群监控指标

(1)查看Broker进程是否启动,端口是否建立。在容器化的Kafka环境中,使用Docker启动Kafka Broker时,Docker容器虽然成功启动,但网络设置如果配置有误,就可能会出现进程已经启动但端口未成功建立监听的情形。

(2)查看Broker端关键日志。Broker端服务器日志server.log,控制器日志controller.log以及主题分区状态变更日志state-change.log。

(3)查看Broker端关键线程的运行状态。Kafka Broker进程会启动十几个甚至是几十个线程。在实际生产环境中,Log Compaction线程是以kafka-log-cleaner-thread开头的,负责日志Compaction;副本拉取消息的线程,通常以ReplicaFetcherThread开头,负责执行Follower副本向Leader副本拉取消息的逻辑。

(4)查看Broker端的关键JMX指标。

BytesIn/BytesOut:即Broker端每秒入站和出站字节数,如果值接近网络带宽,很容易出现网络丢包的情形。

NetworkProcessorAvgIdlePercent:即网络线程池线程平均的空闲比例,通常需要确保其值长期大于30%。如果小于30%,表明网络线程池非常繁忙,需要通过增加网络线程数或将负载转移给其它服务器的方式,来给Broker减负。

RequestHandlerAvgIdlePercent:即I/O线程池线程平均的空闲比例。如果值长期小于30%,需要调整I/O线程池的数量或者减少 Broker端的负载。

UnderReplicatedPartitions:即未充分备份的分区数。所谓未充分备份,是指并非所有的Follower副本都和Leader副本保持同步。

ISRShrink/ISRExpand:即ISR收缩和扩容的频次指标。如果生产环境中出现ISR中副本频繁进出的情形,其值一定是很高的。需要诊断下副本频繁进出ISR的原因,并采取适当的措施。

ActiveControllerCount:即当前处于激活状态的控制器的数量。通常,Controller所在Broker上的ActiveControllerCount指标值是1,其它Broker上的值是 0。如果发现存在多台Broker上ActiveControllerCount值都是1,表明Kafka集群出现了脑裂,必须尽快处理,处理方式主要是查看网络连通性。脑裂问题是非常严重的分布式故障,Kafka目前依托ZooKeeper来防止脑裂,一旦出现脑裂,Kafka无法保证正常工作。

(5)监控Kafka客户端。客户端所在的机器与Kafka Broker机器之间的网络往返时延(Round-Trip Time,RTT)。对于生产者,以kafka-producer-network-thread开头的线程负责实际消息发送,一旦挂掉,Producer将无法正常工作,但Producer进程不会自动挂掉。对于消费者,以kafka-coordinator-heartbeat-thread 开头的心跳线程事关Rebalance。

从Producer角度,需要关注的JMX指标是request-latency,即消息生产请求的延时,最直接地表征Producer程序的TPS;从 Consumer角度,records-lag和records-lead是两个重要的JMX 指标。如果使用Consumer Group,需要关注join rate和sync rate指标,其表明Rebalance的频繁程度。

二、JMX监控Kafka

1、JMX简介

JMX(Java Management Extensions)可以管理、监控正在运行中的Java程序,用于管理线程、内存、日志Level、服务重启、系统环境等。

2、Kafka开启JMX

开启JMX端口的方式有两种:

(1)启动Kafka时设置JMX_PORT

export  JMX_PORT=9999 kafka-server-start.sh -daemon config/server.properties

(2)修改kafka-run-class.sh

在kafka-run-class.sh文件开始增加下列行:

JMX_PORT=9999

修改kafka-run-class.sh文件后重启Kafka集群。

(3)Kafka Docker容器服务的JMX开启

Kafka容器服务的docker-compose.yml文件导入KAFKA_JMX_OPTS和JMX_PORT环境变量。

  1. KAFKA_JMX_OPTS: "-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Djava.rmi.server.hostname=192.168.0.105 -Dcom.sun.management.jmxremote.rmi.port=9999"
  2. JMX_PORT: 9999

将相应的JMX端口对外暴露。

  1. ports:
  2.       - "9999:9999" # 对外暴露端口号

3、JMX_PORT占用问题

Kafka需要监控Broker和Topic数据时,需要开启JMX_PORT,通常在脚本kafka-run-class.sh里面定义JMX_PORT变量,但JMX_PORT定义完成后,执行bin目录下脚本工具会报错。原因在于

kafka-run-class.sh是被调用脚本,当被其它脚本调用时,Java会绑定JMX_PORT,导致端口被占用。

 解决方法是在执行Kafka启动时指定JMX_PORT。

(1)supervisor启动Kafka,在supervisor服务启动配置文件中加入environment=JMX_PORT=9999。

(2)kafka-server-start.sh脚本启动Kafka,在启动时export JMX_PORT=9999或者在kafka-server-start.sh脚本指定。

(3)修改kafka-run-class.sh脚本

修改Kafka安装目录下的bin/Kafka-run-class.sh文件:

 Kafka监控工具

1、JMXTool工具

JMXTool是Kafka社区的工具,能够实时查看Kafka JMX指标。

kafka-run-class.sh kafka.tools.JmxTool

--attributes:指定要查询的JMX属性名称,是以逗号分隔的CSV格式。

--date-format:指定显示的日志格式

--jmx-url:指定要连接的JMX接口,默认格式是service:jmx:rmi:///jndi/rmi://:JMX端口/jmxrmi。

--object-name:指定要查询的JMX MBean名称。

--reporting-interval:指定实时查询的时间间隔,默认2s。

每秒查询一次过去1分钟的Broker端每秒入站的流量(BytesInPerSec)命令如下:

kafka-run-class.sh kafka.tools.JmxTool --object-name kafka.server:type=BrokerTopicMetrics,name=BytesInPerSec --jmx-url service:jmx:rmi:///jndi/rmi://:9999/jmxrmi --date-format "YYYY-MM-dd HH:mm:ss" --attributes OneMinuteRate --reporting-interval 1000

ActiveController JMX指标查看命令如下:

kafka-run-class.sh kafka.tools.JmxTool --object-name kafka.controller:type=KafkaController,name=ActiveControllerCount --jmx-url service:jmx:rmi:///jndi/rmi://:9999/jmxrmi --date-format "YYYY-MM-dd HH:mm:ss" --reporting-interval 1000

2、Kafka Manager

Kafka Manager是雅虎公司于2015年开源的一个Kafka监控框架,使用Scala语言开发,主要用于管理和监控Kafka集群。

Kafka Manager目前已经改名为CMAK (Cluster Manager for Apache Kafka)。

GitHub地址:

https://github.com/yahoo/CMAK

Kafka Manager Docker镜像:kafkamanager/kafka-manager

如果需要设置Kafka Manager基本安全认证,可以为Kafka Manager设置环境变量:

  1. KAFKA_MANAGER_AUTH_ENABLED: "true"
  2. KAFKA_MANAGER_USERNAME: username
  3. KAFKA_MANAGER_PASSWORD: password

Kafka-Manager服务部署Docker-Compose.yml文件如下:

  1. # 定义kafka-manager服务
  2. kafka-manager-test:
  3.   image: kafkamanager/kafka-manager # kafka-manager镜像
  4.   restart: always
  5.   container_name: kafka-manager-test
  6.   hostname: kafka-manager-test
  7.   ports:
  8.     - "9000:9000"  # 对外暴露端口,提供web访问
  9.   depends_on:
  10.     - kafka-test # 依赖
  11.   environment:
  12.     ZK_HOSTS: zookeeper-test:2181 # 宿主机IP
  13.     KAFKA_BROKERS: kafka-test:9090 # kafka
  14.     KAFKA_MANAGER_AUTH_ENABLED: "true"
  15.     KAFKA_MANAGER_USERNAME: admin
  16.     KAFKA_MANAGER_PASSWORD: password

启动Kafka Manager服务,登录Kafka Manager Web。

Web地址:http://127.0.0.1:9000

 增加Kafka-Manager管理Kafka Broker节点:

 3、JMXTrans + InfluxDB + Grafana

通常,监控框架可以使用JMXTrans + InfluxDB + Grafana组合,由于Grafana支持对JMX指标的监控,因此很容易将Kafka各种 JMX指标集成进来,对于已经采用JMXTrans + InfluxDB + Grafana监控方案的公司来说,可以直接复用已有的监控框架,可以极大地节省运维成本。

4、Confluent Control Center

Control Center能够实时地监控Kafka集群,同时还能够帮助操作和搭建基于Kafka的实时流处理应用。Control Center不是免费的,必须使用Confluent Kafka Platform企业版才能使用。

 5、jconsole

Jconsole(Java Monitoring and Management Console)是一种基于JMX的可视化监视、管理工具,提供概述、内存、线程、类、VM概要、MBean的监控。

在Linux Terminal执行jsoncole,在弹出的窗口的远程进程中输入service:jmx:rmi:///jndi/rmi://192.168.0.105:9999/jmxrmi或192.168.0.105:9999。

 选择MBeans选项卡,

 6、KafkaCenter

KafkaCenter是EC Bigdata Team多年kafka使用经验的落地实践,整合集群管理、集群运维、生产监控、消费监控、周边生态等统一一站式解决方案,目前已经开源。

KafkaCenter主要功能模块:

(1)Home:查看平台管理的Kafka Cluster集群信息及监控信息。

(2)Topic:用户可以查看自己的Topic,发起申请新建Topic,同时可以对Topic进行生产消费测试。

(3)Monitor:用户可以查看Topic的生产以及消费情况,同时可以针对消费延迟情况设置预警信息。

(4)Kafka Connect:实现用户快速创建自己的Connect Job,并对自己的Connect进行维护。

(5)KSQL:实现用户快速创建自己的KSQL Job,并对自己的Job进行维护。

(6)Approve:主要用于当普通用户申请创建Topic,管理员进行审批操作。

(7)Setting:主要功能为管理员维护User、Team以及kafka cluster信息。

(8)Kafka Manager:用于管理员对集群的正常维护操作。

GitHub地址:https://github.com/xaecbd/KafkaCenter

四、JMXTrans

1、JMXTrans简介

JMXTrans是一个通过JMX采集Java应用程序的数据采集器,只要Java应用程序开启JMX端口,就可以进行采集。

JMXTrans以后台deamon形式运行,每隔1分钟采集一次数据。

GitHub地址:https://github.com/jmxtrans/jmxtrans

JMXTrans Docker容器镜像下载:

docker pull jmxtrans/jmxtrans

2、JMXTrans配置文件

JMXTrans默认读取/var/lib/jmxtrans目录下所有数据源配置文件(json格式文件),实时从数据源中获取数据,解析数据后存储到InfluxDB中。

JMXTrans配置JSON文件如下:

  1. {
  2.    "servers": [{
  3.       "port": "9901",
  4.       "host": "192.168.0.105",
  5.       "queries": [{
  6.          "obj": "kafka.server:type=BrokerTopicMetrics,name=MessagesInPerSec",
  7.          "attr": ["MeanRate", "OneMinuteRate", "FiveMinuteRate", "FifteenMinuteRate"],
  8.          "resultAlias": "kafkaServer",
  9.          "outputWriters": [{
  10.             "@class": "com.googlecode.jmxtrans.model.output.InfluxDbWriterFactory",
  11.             "url": "http://192.168.0.105:8086/",
  12.             "username": "admin",
  13.             "password": "123456",
  14.             "database": "jmx",
  15.             "tags": {
  16.                "application": "kafka_server"
  17.             }
  18.          }]
  19.       }]
  20.    }]
  21. }

servers:数组,数据源配置。

port:字符串,接收jmx的json数据的端口

host:字符串,接收jmx的json数据的IP地址

queries:数组,具体监控指标项,按JSON格式列出多个指标项,监控指标可以通过jconsole工具(JDK自带的工具)获取。

obj:字符串,监控指标的名称

attr:数组,需要存储的指标项字段,是数据目标表的字段名

resultAlias:字符串,InfluxDB中的表名

outputWriters:数组,数据目的地

@class:字符串,数据目的地的类

url:字符串,数据目的地( InfluxDb )的url

username:字符串,InfluxDB登录名

password:字符串,InfluxDB登录密码

database:字符串,InfluxDB数据库名(需要预先创好)

tags:json,避免指标项在 InfluxDbB表中所对应的字段重名的情况

3、Kafka JMX监控指标

Kafka的JMX监控指标可以通过jconsole进行获取。

对于BytesInPerSec监控指标,在jconsole的MBeans选项页找到BytesInPerSe。

 ObjectName的值是监控指标obj的值。

ObjectName的属性是"attr"对应的指标值,可以选择一个或多个。

metric名称是resultAlias对应的指标值,在InfluxDB中是MEASUREMENTS名。

"tags" 对应InfluxDB的tag功能,用于与存储在同一个MEASUREMENTS里的不同监控指标做区分。

  1. {      
  2.    "obj":"kafka.server:type=BrokerTopicMetrics,name=BytesInPerSec",
  3.          "attr":[ "Count", "EventType""RateUnit""OneMinuteRate" ],
  4.          "resultAlias":"BytesInPerSec",
  5.          "outputWriters": [{
  6.       "@class" :   "com.googlecode.jmxtrans.model.output.InfluxDbWriterFactory",
  7.               "url" :   "http://192.168.0.105:8086/",
  8.               "username" :   "admin",
  9.               "password" :   "123456",
  10.               "database" :   "jmx",
  11.               "tags"     :  {
  12.          "application" :   "BytesInPerSec"
  13.       }
  14.    } ]
  15. }

对于全局监控,每一个监控指标对应一个InfluxDB的MEASUREMENTS,所有的Kafka节点的同一个监控指标数据写同一个MEASUREMENTS;对于Topic的监控指标,同一个Topic的所有Kafka节点写到同一个MEASUREMENTS,并且以Topic名称命名。

  1. {
  2.   "servers" : [ {
  3.     "port" : "9999",
  4.     "host" : "192.168.0.105",
  5.     "queries" : [ {
  6.       "obj" : "java.lang:type=Memory",
  7.       "attr" : [ "HeapMemoryUsage", "NonHeapMemoryUsage" ],
  8.       "resultAlias":"jvmMemory",
  9.       "outputWriters" : [ {
  10.         "@class" : "com.googlecode.jmxtrans.model.output.InfluxDbWriterFactory",
  11.         "url" : "http://192.168.0.105:8086/",
  12.         "username" : "admin",
  13.         "password" : "123456",
  14.         "database" : "jmx",
  15.         "tags"     : {"application" : "kafka_server"}
  16.       } ]
  17.     },{
  18.       "obj" : "kafka.server:type=BrokerTopicMetrics,name=MessagesInPerSec",
  19.       "attr" : [ "MeanRate","OneMinuteRate","FiveMinuteRate","FifteenMinuteRate" ],
  20.       "resultAlias":"kafkaServer",
  21.       "outputWriters" : [ {
  22.         "@class" : "com.googlecode.jmxtrans.model.output.InfluxDbWriterFactory",
  23.         "url" : "http://192.168.0.105:8086/",
  24.         "username" : "admin",
  25.         "password" : "123456",
  26.         "database" : "jmx",
  27.         "tags"     : {"application" : "kafka_server"}
  28.       } ]
  29.     },{
  30.       "obj" : "kafka.server:type=BrokerTopicMetrics,name=BytesInPerSec",
  31.       "attr" : [ "MeanRate","OneMinuteRate","FiveMinuteRate","FifteenMinuteRate" ],
  32.       "resultAlias":"kafkaServer",
  33.       "outputWriters" : [ {
  34.         "@class" : "com.googlecode.jmxtrans.model.output.InfluxDbWriterFactory",
  35.         "url" : "http://192.168.0.105:8086/",
  36.         "username" : "admin",
  37.         "password" : "123456",
  38.         "database" : "jmx",
  39.         "tags"     : {"application" : "kafka_server"}
  40.       } ]
  41.     },{
  42.       "obj" : "kafka.server:type=BrokerTopicMetrics,name=BytesOutPerSec",
  43.       "attr" : [ "MeanRate","OneMinuteRate","FiveMinuteRate","FifteenMinuteRate" ],
  44.       "resultAlias":"kafkaServer",
  45.       "outputWriters" : [ {
  46.         "@class" : "com.googlecode.jmxtrans.model.output.InfluxDbWriterFactory",
  47.         "url" : "http://192.168.0.105:8086/",
  48.         "username" : "admin",
  49.         "password" : "123456",
  50.         "database" : "jmx",
  51.         "tags"     : {"application" : "kafka_server"}
  52.       } ]
  53.     },{
  54.       "obj" : "kafka.server:type=BrokerTopicMetrics,name=FailedProduceRequestsPerSec",
  55.       "attr" : [ "MeanRate","OneMinuteRate","FiveMinuteRate","FifteenMinuteRate" ],
  56.       "resultAlias":"kafkaServer",
  57.       "outputWriters" : [ {
  58.         "@class" : "com.googlecode.jmxtrans.model.output.InfluxDbWriterFactory",
  59.         "url" : "http://192.168.0.105:8086/",
  60.         "username" : "admin",
  61.         "password" : "123456",
  62.         "database" : "jmx",
  63.         "tags"     : {"application" : "kafka_server"}
  64.       } ]
  65.     },{
  66.       "obj" : "kafka.server:type=BrokerTopicMetrics,name=FailedFetchRequestsPerSec",
  67.       "attr" : [ "MeanRate","OneMinuteRate","FiveMinuteRate","FifteenMinuteRate" ],
  68.       "resultAlias":"kafkaServer",
  69.       "outputWriters" : [ {
  70.         "@class" : "com.googlecode.jmxtrans.model.output.InfluxDbWriterFactory",
  71.         "url" : "http://192.168.0.105:8086/",
  72.         "username" : "admin",
  73.         "password" : "123456",
  74.         "database" : "jmx",
  75.         "tags"     : {"application" : "kafka_server"}
  76.       } ]
  77.     },{
  78.       "obj" : "kafka.server:type=BrokerTopicMetrics,name=TotalFetchRequestsPerSec",
  79.       "attr" : [ "MeanRate","OneMinuteRate","FiveMinuteRate","FifteenMinuteRate" ],
  80.       "resultAlias":"kafkaServer",
  81.       "outputWriters" : [ {
  82.         "@class" : "com.googlecode.jmxtrans.model.output.InfluxDbWriterFactory",
  83.         "url" : "http://192.168.0.105:8086/",
  84.         "username" : "admin",
  85.         "password" : "123456",
  86.         "database" : "jmx",
  87.         "tags"     : {"application" : "kafka_server"}
  88.       } ]
  89.     },{
  90.       "obj" : "kafka.server:type=BrokerTopicMetrics,name=TotalProduceRequestsPerSec",
  91.       "attr" : [ "MeanRate","OneMinuteRate","FiveMinuteRate","FifteenMinuteRate" ],
  92.       "resultAlias":"kafkaServer",
  93.       "outputWriters" : [ {
  94.         "@class" : "com.googlecode.jmxtrans.model.output.InfluxDbWriterFactory",
  95.         "url" : "http://192.168.0.105:8086/",
  96.         "username" : "admin",
  97.         "password" : "123456",
  98.         "database" : "jmx",
  99.         "tags"     : {"application" : "kafka_server"}
  100.       } ]
  101.     },{
  102.       "obj" : "kafka.server:type=ReplicaManager,name=UnderReplicatedPartitions",
  103.       "attr" : [ "Value" ],
  104.       "resultAlias":"underReplicated",
  105.       "outputWriters" : [ {
  106.         "@class" : "com.googlecode.jmxtrans.model.output.InfluxDbWriterFactory",
  107.         "url" : "http://192.168.0.105:8086/",
  108.         "username" : "admin",
  109.         "password" : "123456",
  110.         "database" : "jmx",
  111.         "tags"     : {"application" : "kafka_server"}
  112.       } ]
  113.     },{
  114.       "obj" : "kafka.controller:type=KafkaController,name=ActiveControllerCount",
  115.       "attr" : [ "Value" ],
  116.       "resultAlias":"activeController",
  117.       "outputWriters" : [ {
  118.         "@class" : "com.googlecode.jmxtrans.model.output.InfluxDbWriterFactory",
  119.         "url" : "http://192.168.0.105:8086/",
  120.         "username" : "admin",
  121.         "password" : "123456",
  122.         "database" : "jmx",
  123.         "tags"     : {"application" : "kafka_server"}
  124.       } ]
  125.     },{
  126.       "obj" : "java.lang:type=OperatingSystem",
  127.       "attr" : [ "FreePhysicalMemorySize","SystemCpuLoad","ProcessCpuLoad","SystemLoadAverage" ],
  128.       "resultAlias":"jvmMemory",
  129.       "outputWriters" : [ {
  130.         "@class" : "com.googlecode.jmxtrans.model.output.InfluxDbWriterFactory",
  131.         "url" : "http://192.168.0.105:8086/",
  132.         "username" : "admin",
  133.         "password" : "123456",
  134.         "database" : "jmx",
  135.         "tags"     : {"application" : "kafka_server"}
  136.       } ]
  137.     } ,{
  138.       "obj" : "kafka.network:type=SocketServer,name=NetworkProcessorAvgIdlePercent",
  139.       "attr" : [ "Value" ],
  140.       "resultAlias":"network",
  141.       "outputWriters" : [ {
  142.         "@class" : "com.googlecode.jmxtrans.model.output.InfluxDbWriterFactory",
  143.         "url" : "http://192.168.0.105:8086/",
  144.         "username" : "admin",
  145.         "password" : "123456",
  146.         "database" : "jmx",
  147.         "tags"     : {"application" : "kafka_server"}
  148.       } ]
  149.     },{
  150.       "obj" : "kafka.server:type=KafkaRequestHandlerPool,name=RequestHandlerAvgIdlePercent",
  151.       "attr" : [ "MeanRate","OneMinuteRate","FiveMinuteRate","FifteenMinuteRate" ],
  152.       "resultAlias":"network",
  153.       "outputWriters" : [ {
  154.         "@class" : "com.googlecode.jmxtrans.model.output.InfluxDbWriterFactory",
  155.         "url" : "http://192.168.0.105:8086/",
  156.         "username" : "admin",
  157.         "password" : "123456",
  158.         "database" : "jmx",
  159.         "tags"     : {"application" : "kafka_server"}
  160.       } ]
  161.     },{
  162.       "obj" : "java.lang:type=GarbageCollector,name=G1 Young Generation",
  163.       "attr" : [ "CollectionCount","CollectionTime" ],
  164.       "resultAlias":"gc",
  165.       "outputWriters" : [ {
  166.         "@class" : "com.googlecode.jmxtrans.model.output.InfluxDbWriterFactory",
  167.         "url" : "http://192.168.0.105:8086/",
  168.         "username" : "admin",
  169.         "password" : "123456",
  170.         "database" : "jmx",
  171.         "tags"     : {"application" : "kafka_server"}
  172.       } ]
  173.     }]
  174.   } ]
  175. }

4、JMXTrans部署

JMX通过网络连接,因此JMXtrans有2种部署方案:

(1)集中式。在一台服务器上部署JMXtrans,分别连接所有的Kafka Broker实例,并将数据写入到InfluxDB。为了减少网络传输,通常部署到InfluxDB所在服务器上。

(2)分布式。每个Kafka Broker实例部署一个JMXtrans。

JMXTrans配置文件分全局指标(每个Kafka节点)和Topic指标,全局指标是每个节点一个配置文件,命名规则:kafka-brokerxx.json,Topic指标是每个Topic一个配置文件,命名规则:TopicName.json。

五、Kafka监控方案实例

1、Kafka监控架构方案选择

监控系统架构通常分为三部分:数据采集、分析与转换、数据展示(可视化)。

(1)数据采集

数据采集通常先开发数据采集程序,然后使用Nagios、Zabbix等监控软件来调度执行,并将采集到的数据进行上报。对于Java程序,可以使用JMXTrans采集数据。

(2)分析与转换

Kafka是Java应用程序,所提供的性能指标数据已经非常全面,指标的直方图、次数、最大最小、标准方差都已经计算好,因此不需要再对数据进行分析加工,直接将MBeans数据存储到InfluxDB。

(3)数据可视化

Grafana是一个开源的可视化面板(Dashboard),支持Graphite、Zabbix、InfluxDB、Prometheus和OpenTSDB作为数据源。

2、InfluxDB部署

InfluxDB是一款用Go语言编写的开源分布式时序、事件和指标数据库,无需外部依赖,主要用于存储涉及大量的时间戳数据,如DevOps监控数据、APP metrics、lOT传感器数据和实时分析数据。

docker pull influxdb

influxdb.yml文件:

  1. version: '2'
  2. services:
  3.   influxdb:
  4.     image: influxdb
  5.     container_name: influxdb
  6.     volumes:
  7.       - /data/influxdb/conf:/etc/influxdb
  8.       - /data/influxdb/data:/var/lib/influxdb/data
  9.       - /data/influxdb/meta:/var/lib/influxdb/meta
  10.       - /data/influxdb/wal:/var/lib/influxdb/wal
  11.     ports:
  12.       - "8086:8086"
  13.     restart: always

结果查看:

docker exec -it influxdb influx

3、JMXTrans部署

JMXTrans是一个通过JMX采集Java应用程序的数据采集器,只要Java应用程序开启JMX端口,就可以进行采集。

docker pull jmxtrans/jmxtrans

JMXTrans默认读取/var/lib/jmxtrans目录下所有数据源配置文件(json格式文件),实时从数据源中获取数据,解析数据后存储到InfluxDB中。

  1. version: '2'
  2. services:
  3.   # JMXTrans服务
  4.   jmxtrans:
  5.     image: jmxtrans/jmxtrans
  6.     container_name: jmxtrans
  7.     volumes:
  8.       - ./jmxtrans:/var/lib/jmxtrans

4、Grafana部署

Grafana是一个可视化面板(Dashboard),有非常漂亮的图表和布局展示,功能齐全的度量仪表盘和图形编辑器,支持Graphite、zabbix、InfluxDB、Prometheus和OpenTSDB作为数据源。

Grafana主要特性如下:

(1)展示方式:快速灵活的客户端图表,面板插件有许多不同方式的可视化指标和日志,官方库中具有丰富的仪表盘插件,比如热图、折线图、图表等多种展示方式。

(2)数据源:Graphite,InfluxDB,OpenTSDB,Prometheus,Elasticsearch,CloudWatch和KairosDB等。

(3)通知提醒:以可视方式定义最重要指标的警报规则,Grafana将不断计算并发送通知,在数据达到阈值时通过Slack、PagerDuty等获得通知。

(4)混合展示:在同一图表中混合使用不同的数据源,可以基于每个查询指定数据源,甚至自定义数据源。

(5)注释:使用来自不同数据源的丰富事件注释图表,将鼠标悬停在事件上会显示完整的事件元数据和标记。

(6)过滤器:Ad-hoc过滤器允许动态创建新的键/值过滤器,这些过滤器会自动应用于使用该数据源的所有查询。

GitHub地址:https://github.com/grafana/grafana

Grafana容器镜像下载:

docker pull grafana/grafana:6.5.0

Grafana容器启动:

docker run -d --name=grafana -p 3000:3000 grafana/grafana:6.5.0

Web登录:192.168.0.105:3000

 初次登录默认使用admin/admin登录,登录后会强制要求修改密码。

增加数据源:

 导入DashBoard模板:

 DashBoard模板json文件如下:

  1. {
  2.   "__inputs": [
  3.     {
  4.       "name": "DS_KAFKAMONITOR",
  5.       "label": "KafkaMonitor",
  6.       "description": "",
  7.       "type": "datasource",
  8.       "pluginId": "influxdb",
  9.       "pluginName": "InfluxDB"
  10.     }
  11.   ],
  12.   "__requires": [
  13.     {
  14.       "type": "grafana",
  15.       "id": "grafana",
  16.       "name": "Grafana",
  17.       "version": "6.7.3"
  18.     },
  19.     {
  20.       "type": "panel",
  21.       "id": "graph",
  22.       "name": "Graph",
  23.       "version": ""
  24.     },
  25.     {
  26.       "type": "datasource",
  27.       "id": "influxdb",
  28.       "name": "InfluxDB",
  29.       "version": "1.0.0"
  30.     }
  31.   ],
  32.   "annotations": {
  33.     "list": [
  34.       {
  35.         "$$hashKey": "object:318",
  36.         "builtIn": 1,
  37.         "datasource": "-- Grafana --",
  38.         "enable": true,
  39.         "hide": true,
  40.         "iconColor": "rgba(0, 211, 255, 1)",
  41.         "name": "Annotations & Alerts",
  42.         "type": "dashboard"
  43.       }
  44.     ]
  45.   },
  46.   "editable": true,
  47.   "gnetId": null,
  48.   "graphTooltip": 0,
  49.   "id": null,
  50.   "links": [],
  51.   "panels": [
  52.     {
  53.       "aliasColors": {},
  54.       "bars": false,
  55.       "dashLength": 10,
  56.       "dashes": false,
  57.       "datasource": "${DS_KAFKAMONITOR}",
  58.       "description": "java.lang:type=OperatingSystem",
  59.       "fill": 1,
  60.       "fillGradient": 0,
  61.       "gridPos": {
  62.         "h": 12,
  63.         "w": 8,
  64.         "x": 0,
  65.         "y": 0
  66.       },
  67.       "hiddenSeries": false,
  68.       "id": 6,
  69.       "legend": {
  70.         "alignAsTable": true,
  71.         "avg": true,
  72.         "current": true,
  73.         "max": true,
  74.         "min": true,
  75.         "show": true,
  76.         "total": false,
  77.         "values": true
  78.       },
  79.       "lines": true,
  80.       "linewidth": 1,
  81.       "nullPointMode": "null",
  82.       "options": {
  83.         "dataLinks": []
  84.       },
  85.       "percentage": false,
  86.       "pointradius": 2,
  87.       "points": false,
  88.       "renderer": "flot",
  89.       "seriesOverrides": [],
  90.       "spaceLength": 10,
  91.       "stack": false,
  92.       "steppedLine": false,
  93.       "targets": [
  94.         {
  95.           "alias": "",
  96.           "groupBy": [
  97.             {
  98.               "params": [
  99.                 "1m"
  100.               ],
  101.               "type": "time"
  102.             },
  103.             {
  104.               "params": [
  105.                 "hostname"
  106.               ],
  107.               "type": "tag"
  108.             },
  109.             {
  110.               "params": [
  111.                 "null"
  112.               ],
  113.               "type": "fill"
  114.             }
  115.           ],
  116.           "measurement": "jvmMemory",
  117.           "orderByTime": "ASC",
  118.           "policy": "default",
  119.           "refId": "A",
  120.           "resultFormat": "time_series",
  121.           "select": [
  122.             [
  123.               {
  124.                 "params": [
  125.                   "ProcessCpuLoad"
  126.                 ],
  127.                 "type": "field"
  128.               },
  129.               {
  130.                 "params": [],
  131.                 "type": "last"
  132.               },
  133.               {
  134.                 "params": [
  135.                   "进程CPU使用率"
  136.                 ],
  137.                 "type": "alias"
  138.               }
  139.             ]
  140.           ],
  141.           "tags": []
  142.         }
  143.       ],
  144.       "thresholds": [],
  145.       "timeFrom": null,
  146.       "timeRegions": [],
  147.       "timeShift": null,
  148.       "title": "Kafka进程CPU使用率",
  149.       "tooltip": {
  150.         "shared": true,
  151.         "sort": 0,
  152.         "value_type": "individual"
  153.       },
  154.       "type": "graph",
  155.       "xaxis": {
  156.         "buckets": null,
  157.         "mode": "time",
  158.         "name": null,
  159.         "show": true,
  160.         "values": []
  161.       },
  162.       "yaxes": [
  163.         {
  164.           "$$hashKey": "object:1134",
  165.           "format": "percentunit",
  166.           "label": null,
  167.           "logBase": 1,
  168.           "max": null,
  169.           "min": null,
  170.           "show": true
  171.         },
  172.         {
  173.           "$$hashKey": "object:1135",
  174.           "format": "short",
  175.           "label": null,
  176.           "logBase": 1,
  177.           "max": null,
  178.           "min": null,
  179.           "show": true
  180.         }
  181.       ],
  182.       "yaxis": {
  183.         "align": false,
  184.         "alignLevel": null
  185.       }
  186.     },
  187.     {
  188.       "aliasColors": {},
  189.       "bars": false,
  190.       "dashLength": 10,
  191.       "dashes": false,
  192.       "datasource": "${DS_KAFKAMONITOR}",
  193.       "description": "服务器CPU使用率",
  194.       "fill": 1,
  195.       "fillGradient": 0,
  196.       "gridPos": {
  197.         "h": 12,
  198.         "w": 8,
  199.         "x": 8,
  200.         "y": 0
  201.       },
  202.       "hiddenSeries": false,
  203.       "id": 2,
  204.       "legend": {
  205.         "alignAsTable": true,
  206.         "avg": true,
  207.         "current": true,
  208.         "max": true,
  209.         "min": true,
  210.         "show": true,
  211.         "total": false,
  212.         "values": true
  213.       },
  214.       "lines": true,
  215.       "linewidth": 1,
  216.       "nullPointMode": "null",
  217.       "options": {
  218.         "dataLinks": []
  219.       },
  220.       "percentage": false,
  221.       "pointradius": 2,
  222.       "points": false,
  223.       "renderer": "flot",
  224.       "seriesOverrides": [],
  225.       "spaceLength": 10,
  226.       "stack": false,
  227.       "steppedLine": false,
  228.       "targets": [
  229.         {
  230.           "alias": "",
  231.           "groupBy": [
  232.             {
  233.               "params": [
  234.                 "1m"
  235.               ],
  236.               "type": "time"
  237.             },
  238.             {
  239.               "params": [
  240.                 "hostname"
  241.               ],
  242.               "type": "tag"
  243.             },
  244.             {
  245.               "params": [
  246.                 "null"
  247.               ],
  248.               "type": "fill"
  249.             }
  250.           ],
  251.           "measurement": "jvmMemory",
  252.           "orderByTime": "ASC",
  253.           "policy": "default",
  254.           "refId": "A",
  255.           "resultFormat": "time_series",
  256.           "select": [
  257.             [
  258.               {
  259.                 "params": [
  260.                   "SystemCpuLoad"
  261.                 ],
  262.                 "type": "field"
  263.               },
  264.               {
  265.                 "params": [],
  266.                 "type": "last"
  267.               },
  268.               {
  269.                 "params": [
  270.                   "CPU使用率"
  271.                 ],
  272.                 "type": "alias"
  273.               }
  274.             ]
  275.           ],
  276.           "tags": []
  277.         }
  278.       ],
  279.       "thresholds": [],
  280.       "timeFrom": null,
  281.       "timeRegions": [],
  282.       "timeShift": null,
  283.       "title": "CPU使用率",
  284.       "tooltip": {
  285.         "shared": true,
  286.         "sort": 0,
  287.         "value_type": "individual"
  288.       },
  289.       "type": "graph",
  290.       "xaxis": {
  291.         "buckets": null,
  292.         "mode": "time",
  293.         "name": null,
  294.         "show": true,
  295.         "values": []
  296.       },
  297.       "yaxes": [
  298.         {
  299.           "$$hashKey": "object:369",
  300.           "format": "percentunit",
  301.           "label": null,
  302.           "logBase": 1,
  303.           "max": null,
  304.           "min": null,
  305.           "show": true
  306.         },
  307.         {
  308.           "$$hashKey": "object:370",
  309.           "format": "short",
  310.           "label": null,
  311.           "logBase": 1,
  312.           "max": null,
  313.           "min": null,
  314.           "show": true
  315.         }
  316.       ],
  317.       "yaxis": {
  318.         "align": false,
  319.         "alignLevel": null
  320.       }
  321.     },
  322.     {
  323.       "aliasColors": {},
  324.       "bars": false,
  325.       "dashLength": 10,
  326.       "dashes": false,
  327.       "datasource": "${DS_KAFKAMONITOR}",
  328.       "description": "java.lang:type=OperatingSystem\nLinux系统负载",
  329.       "fill": 1,
  330.       "fillGradient": 0,
  331.       "gridPos": {
  332.         "h": 12,
  333.         "w": 8,
  334.         "x": 16,
  335.         "y": 0
  336.       },
  337.       "hiddenSeries": false,
  338.       "id": 4,
  339.       "legend": {
  340.         "alignAsTable": true,
  341.         "avg": false,
  342.         "current": true,
  343.         "max": true,
  344.         "min": false,
  345.         "show": true,
  346.         "total": false,
  347.         "values": true
  348.       },
  349.       "lines": true,
  350.       "linewidth": 1,
  351.       "nullPointMode": "null",
  352.       "options": {
  353.         "dataLinks": []
  354.       },
  355.       "percentage": false,
  356.       "pointradius": 2,
  357.       "points": false,
  358.       "renderer": "flot",
  359.       "seriesOverrides": [],
  360.       "spaceLength": 10,
  361.       "stack": false,
  362.       "steppedLine": false,
  363.       "targets": [
  364.         {
  365.           "alias": "",
  366.           "groupBy": [
  367.             {
  368.               "params": [
  369.                 "1m"
  370.               ],
  371.               "type": "time"
  372.             },
  373.             {
  374.               "params": [
  375.                 "hostname"
  376.               ],
  377.               "type": "tag"
  378.             },
  379.             {
  380.               "params": [
  381.                 "null"
  382.               ],
  383.               "type": "fill"
  384.             }
  385.           ],
  386.           "measurement": "jvmMemory",
  387.           "orderByTime": "ASC",
  388.           "policy": "default",
  389.           "refId": "A",
  390.           "resultFormat": "time_series",
  391.           "select": [
  392.             [
  393.               {
  394.                 "params": [
  395.                   "SystemLoadAverage"
  396.                 ],
  397.                 "type": "field"
  398.               },
  399.               {
  400.                 "params": [],
  401.                 "type": "last"
  402.               },
  403.               {
  404.                 "params": [
  405.                   "系统负载"
  406.                 ],
  407.                 "type": "alias"
  408.               }
  409.             ]
  410.           ],
  411.           "tags": []
  412.         }
  413.       ],
  414.       "thresholds": [],
  415.       "timeFrom": null,
  416.       "timeRegions": [],
  417.       "timeShift": null,
  418.       "title": "系统负载",
  419.       "tooltip": {
  420.         "shared": true,
  421.         "sort": 0,
  422.         "value_type": "individual"
  423.       },
  424.       "type": "graph",
  425.       "xaxis": {
  426.         "buckets": null,
  427.         "mode": "time",
  428.         "name": null,
  429.         "show": true,
  430.         "values": []
  431.       },
  432.       "yaxes": [
  433.         {
  434.           "$$hashKey": "object:656",
  435.           "format": "short",
  436.           "label": null,
  437.           "logBase": 1,
  438.           "max": null,
  439.           "min": null,
  440.           "show": true
  441.         },
  442.         {
  443.           "$$hashKey": "object:657",
  444.           "format": "short",
  445.           "label": null,
  446.           "logBase": 1,
  447.           "max": null,
  448.           "min": null,
  449.           "show": true
  450.         }
  451.       ],
  452.       "yaxis": {
  453.         "align": false,
  454.         "alignLevel": null
  455.       }
  456.     },
  457.     {
  458.       "aliasColors": {},
  459.       "bars": false,
  460.       "dashLength": 10,
  461.       "dashes": false,
  462.       "datasource": "${DS_KAFKAMONITOR}",
  463.       "description": "Kafka每个broker每秒中的数据量,包括__consumer_offsets topic",
  464.       "fill": 1,
  465.       "fillGradient": 0,
  466.       "gridPos": {
  467.         "h": 12,
  468.         "w": 8,
  469.         "x": 0,
  470.         "y": 12
  471.       },
  472.       "hiddenSeries": false,
  473.       "id": 34,
  474.       "legend": {
  475.         "alignAsTable": true,
  476.         "avg": false,
  477.         "current": true,
  478.         "max": true,
  479.         "min": true,
  480.         "show": true,
  481.         "total": false,
  482.         "values": true
  483.       },
  484.       "lines": true,
  485.       "linewidth": 1,
  486.       "nullPointMode": "null",
  487.       "options": {
  488.         "dataLinks": []
  489.       },
  490.       "percentage": false,
  491.       "pointradius": 2,
  492.       "points": false,
  493.       "renderer": "flot",
  494.       "seriesOverrides": [],
  495.       "spaceLength": 10,
  496.       "stack": false,
  497.       "steppedLine": false,
  498.       "targets": [
  499.         {
  500.           "alias": "",
  501.           "groupBy": [
  502.             {
  503.               "params": [
  504.                 "1m"
  505.               ],
  506.               "type": "time"
  507.             },
  508.             {
  509.               "params": [
  510.                 "hostname"
  511.               ],
  512.               "type": "tag"
  513.             }
  514.           ],
  515.           "hide": false,
  516.           "measurement": "kafkaServer",
  517.           "orderByTime": "ASC",
  518.           "policy": "default",
  519.           "refId": "D",
  520.           "resultFormat": "time_series",
  521.           "select": [
  522.             [
  523.               {
  524.                 "params": [
  525.                   "OneMinuteRate"
  526.                 ],
  527.                 "type": "field"
  528.               },
  529.               {
  530.                 "params": [],
  531.                 "type": "last"
  532.               },
  533.               {
  534.                 "params": [
  535.                   "平均每秒"
  536.                 ],
  537.                 "type": "alias"
  538.               }
  539.             ]
  540.           ],
  541.           "tags": [
  542.             {
  543.               "key": "typeName",
  544.               "operator": "=",
  545.               "value": "type=BrokerTopicMetrics,name=MessagesInPerSec"
  546.             }
  547.           ]
  548.         },
  549.         {
  550.           "alias": "",
  551.           "groupBy": [
  552.             {
  553.               "params": [
  554.                 "1m"
  555.               ],
  556.               "type": "time"
  557.             }
  558.           ],
  559.           "hide": false,
  560.           "measurement": "kafkaServer",
  561.           "orderByTime": "ASC",
  562.           "policy": "default",
  563.           "refId": "A",
  564.           "resultFormat": "time_series",
  565.           "select": [
  566.             [
  567.               {
  568.                 "params": [
  569.                   "OneMinuteRate"
  570.                 ],
  571.                 "type": "field"
  572.               },
  573.               {
  574.                 "params": [],
  575.                 "type": "sum"
  576.               },
  577.               {
  578.                 "params": [
  579.                   "所有broker平均每秒"
  580.                 ],
  581.                 "type": "alias"
  582.               }
  583.             ]
  584.           ],
  585.           "tags": [
  586.             {
  587.               "key": "typeName",
  588.               "operator": "=",
  589.               "value": "type=BrokerTopicMetrics,name=MessagesInPerSec"
  590.             }
  591.           ]
  592.         }
  593.       ],
  594.       "thresholds": [],
  595.       "timeFrom": null,
  596.       "timeRegions": [],
  597.       "timeShift": null,
  598.       "title": "Kafka Topic 每秒数据量",
  599.       "tooltip": {
  600.         "shared": true,
  601.         "sort": 0,
  602.         "value_type": "individual"
  603.       },
  604.       "type": "graph",
  605.       "xaxis": {
  606.         "buckets": null,
  607.         "mode": "time",
  608.         "name": null,
  609.         "show": true,
  610.         "values": []
  611.       },
  612.       "yaxes": [
  613.         {
  614.           "$$hashKey": "object:2118",
  615.           "format": "none",
  616.           "label": null,
  617.           "logBase": 1,
  618.           "max": null,
  619.           "min": null,
  620.           "show": true
  621.         },
  622.         {
  623.           "$$hashKey": "object:2119",
  624.           "format": "short",
  625.           "label": null,
  626.           "logBase": 1,
  627.           "max": null,
  628.           "min": null,
  629.           "show": true
  630.         }
  631.       ],
  632.       "yaxis": {
  633.         "align": false,
  634.         "alignLevel": null
  635.       }
  636.     },
  637.     {
  638.       "aliasColors": {},
  639.       "bars": false,
  640.       "dashLength": 10,
  641.       "dashes": false,
  642.       "datasource": "${DS_KAFKAMONITOR}",
  643.       "description": "java.lang:type=OperatingSystem\n服务器可用物理内存",
  644.       "fill": 1,
  645.       "fillGradient": 0,
  646.       "gridPos": {
  647.         "h": 12,
  648.         "w": 8,
  649.         "x": 8,
  650.         "y": 12
  651.       },
  652.       "hiddenSeries": false,
  653.       "id": 32,
  654.       "legend": {
  655.         "alignAsTable": true,
  656.         "avg": false,
  657.         "current": true,
  658.         "max": false,
  659.         "min": false,
  660.         "show": true,
  661.         "total": false,
  662.         "values": true
  663.       },
  664.       "lines": true,
  665.       "linewidth": 1,
  666.       "nullPointMode": "null",
  667.       "options": {
  668.         "dataLinks": []
  669.       },
  670.       "percentage": false,
  671.       "pointradius": 2,
  672.       "points": false,
  673.       "renderer": "flot",
  674.       "seriesOverrides": [],
  675.       "spaceLength": 10,
  676.       "stack": false,
  677.       "steppedLine": false,
  678.       "targets": [
  679.         {
  680.           "alias": "",
  681.           "groupBy": [
  682.             {
  683.               "params": [
  684.                 "1m"
  685.               ],
  686.               "type": "time"
  687.             },
  688.             {
  689.               "params": [
  690.                 "hostname"
  691.               ],
  692.               "type": "tag"
  693.             },
  694.             {
  695.               "params": [
  696.                 "null"
  697.               ],
  698.               "type": "fill"
  699.             }
  700.           ],
  701.           "measurement": "jvmMemory",
  702.           "orderByTime": "ASC",
  703.           "policy": "default",
  704.           "refId": "A",
  705.           "resultFormat": "time_series",
  706.           "select": [
  707.             [
  708.               {
  709.                 "params": [
  710.                   "FreePhysicalMemorySize"
  711.                 ],
  712.                 "type": "field"
  713.               },
  714.               {
  715.                 "params": [],
  716.                 "type": "last"
  717.               },
  718.               {
  719.                 "params": [
  720.                   "系统剩余物理内存"
  721.                 ],
  722.                 "type": "alias"
  723.               }
  724.             ]
  725.           ],
  726.           "tags": []
  727.         }
  728.       ],
  729.       "thresholds": [],
  730.       "timeFrom": null,
  731.       "timeRegions": [],
  732.       "timeShift": null,
  733.       "title": "可用物理内存",
  734.       "tooltip": {
  735.         "shared": true,
  736.         "sort": 0,
  737.         "value_type": "individual"
  738.       },
  739.       "type": "graph",
  740.       "xaxis": {
  741.         "buckets": null,
  742.         "mode": "time",
  743.         "name": null,
  744.         "show": true,
  745.         "values": []
  746.       },
  747.       "yaxes": [
  748.         {
  749.           "$$hashKey": "object:2324",
  750.           "format": "decbytes",
  751.           "label": null,
  752.           "logBase": 1,
  753.           "max": null,
  754.           "min": null,
  755.           "show": true
  756.         },
  757.         {
  758.           "$$hashKey": "object:2325",
  759.           "format": "short",
  760.           "label": null,
  761.           "logBase": 1,
  762.           "max": null,
  763.           "min": null,
  764.           "show": true
  765.         }
  766.       ],
  767.       "yaxis": {
  768.         "align": false,
  769.         "alignLevel": null
  770.       }
  771.     },
  772.     {
  773.       "aliasColors": {},
  774.       "bars": false,
  775.       "cacheTimeout": null,
  776.       "dashLength": 10,
  777.       "dashes": false,
  778.       "datasource": "${DS_KAFKAMONITOR}",
  779.       "description": "kafka.controller:type=KafkaController,name=ActiveControllerCount\n\nKafka控制器数量,每个集群只有一台机器为1,为1的机器是Kafka控制器Crontroller",
  780.       "fill": 1,
  781.       "fillGradient": 0,
  782.       "gridPos": {
  783.         "h": 12,
  784.         "w": 8,
  785.         "x": 16,
  786.         "y": 12
  787.       },
  788.       "hiddenSeries": false,
  789.       "id": 26,
  790.       "legend": {
  791.         "alignAsTable": true,
  792.         "avg": false,
  793.         "current": true,
  794.         "max": false,
  795.         "min": false,
  796.         "show": true,
  797.         "total": false,
  798.         "values": true
  799.       },
  800.       "lines": true,
  801.       "linewidth": 1,
  802.       "links": [],
  803.       "nullPointMode": "null",
  804.       "options": {
  805.         "dataLinks": []
  806.       },
  807.       "percentage": false,
  808.       "pluginVersion": "6.7.3",
  809.       "pointradius": 2,
  810.       "points": false,
  811.       "renderer": "flot",
  812.       "seriesOverrides": [],
  813.       "spaceLength": 10,
  814.       "stack": false,
  815.       "steppedLine": false,
  816.       "targets": [
  817.         {
  818.           "alias": "",
  819.           "groupBy": [
  820.             {
  821.               "params": [
  822.                 "1m"
  823.               ],
  824.               "type": "time"
  825.             },
  826.             {
  827.               "params": [
  828.                 "hostname"
  829.               ],
  830.               "type": "tag"
  831.             }
  832.           ],
  833.           "measurement": "activeController",
  834.           "orderByTime": "ASC",
  835.           "policy": "default",
  836.           "query": "SELECT sum(\"Value\") AS \"获取控制器数量\" FROM \"activeController\" WHERE $timeFilter GROUP BY time($__interval), \"hostname\"",
  837.           "rawQuery": false,
  838.           "refId": "A",
  839.           "resultFormat": "time_series",
  840.           "select": [
  841.             [
  842.               {
  843.                 "params": [
  844.                   "Value"
  845.                 ],
  846.                 "type": "field"
  847.               },
  848.               {
  849.                 "params": [],
  850.                 "type": "last"
  851.               },
  852.               {
  853.                 "params": [
  854.                   "获取控制器数量"
  855.                 ],
  856.                 "type": "alias"
  857.               }
  858.             ]
  859.           ],
  860.           "tags": [],
  861.           "tz": ""
  862.         }
  863.       ],
  864.       "thresholds": [],
  865.       "timeFrom": null,
  866.       "timeRegions": [],
  867.       "timeShift": null,
  868.       "title": "Kafka控制器数量",
  869.       "tooltip": {
  870.         "shared": true,
  871.         "sort": 0,
  872.         "value_type": "individual"
  873.       },
  874.       "type": "graph",
  875.       "xaxis": {
  876.         "buckets": null,
  877.         "mode": "time",
  878.         "name": null,
  879.         "show": true,
  880.         "values": []
  881.       },
  882.       "yaxes": [
  883.         {
  884.           "$$hashKey": "object:4446",
  885.           "format": "short",
  886.           "label": null,
  887.           "logBase": 1,
  888.           "max": null,
  889.           "min": null,
  890.           "show": true
  891.         },
  892.         {
  893.           "$$hashKey": "object:4447",
  894.           "format": "short",
  895.           "label": null,
  896.           "logBase": 1,
  897.           "max": null,
  898.           "min": null,
  899.           "show": true
  900.         }
  901.       ],
  902.       "yaxis": {
  903.         "align": false,
  904.         "alignLevel": null
  905.       }
  906.     },
  907.     {
  908.       "aliasColors": {},
  909.       "bars": false,
  910.       "dashLength": 10,
  911.       "dashes": false,
  912.       "datasource": "${DS_KAFKAMONITOR}",
  913.       "description": "监控 kafka.server:type=BrokerTopicMetrics,name=BytesOutPerSec 指标",
  914.       "fill": 1,
  915.       "fillGradient": 0,
  916.       "gridPos": {
  917.         "h": 9,
  918.         "w": 8,
  919.         "x": 0,
  920.         "y": 24
  921.       },
  922.       "hiddenSeries": false,
  923.       "id": 16,
  924.       "legend": {
  925.         "alignAsTable": true,
  926.         "avg": true,
  927.         "current": true,
  928.         "max": true,
  929.         "min": true,
  930.         "show": true,
  931.         "total": false,
  932.         "values": true
  933.       },
  934.       "lines": true,
  935.       "linewidth": 1,
  936.       "nullPointMode": "null",
  937.       "options": {
  938.         "dataLinks": []
  939.       },
  940.       "percentage": false,
  941.       "pointradius": 2,
  942.       "points": false,
  943.       "renderer": "flot",
  944.       "seriesOverrides": [],
  945.       "spaceLength": 10,
  946.       "stack": false,
  947.       "steppedLine": false,
  948.       "targets": [
  949.         {
  950.           "alias": "",
  951.           "groupBy": [
  952.             {
  953.               "params": [
  954.                 "1m"
  955.               ],
  956.               "type": "time"
  957.             },
  958.             {
  959.               "params": [
  960.                 "hostname"
  961.               ],
  962.               "type": "tag"
  963.             },
  964.             {
  965.               "params": [
  966.                 "null"
  967.               ],
  968.               "type": "fill"
  969.             }
  970.           ],
  971.           "measurement": "kafkaServer",
  972.           "orderByTime": "ASC",
  973.           "policy": "default",
  974.           "refId": "A",
  975.           "resultFormat": "time_series",
  976.           "select": [
  977.             [
  978.               {
  979.                 "params": [
  980.                   "FiveMinuteRate"
  981.                 ],
  982.                 "type": "field"
  983.               },
  984.               {
  985.                 "params": [],
  986.                 "type": "mean"
  987.               },
  988.               {
  989.                 "params": [
  990.                   "每秒拉取字节数"
  991.                 ],
  992.                 "type": "alias"
  993.               }
  994.             ]
  995.           ],
  996.           "tags": [
  997.             {
  998.               "key": "typeName",
  999.               "operator": "=",
  1000.               "value": "type=BrokerTopicMetrics,name=BytesOutPerSec"
  1001.             }
  1002.           ]
  1003.         }
  1004.       ],
  1005.       "thresholds": [],
  1006.       "timeFrom": null,
  1007.       "timeRegions": [],
  1008.       "timeShift": null,
  1009.       "title": "Kafka每秒拉取流量",
  1010.       "tooltip": {
  1011.         "shared": true,
  1012.         "sort": 0,
  1013.         "value_type": "individual"
  1014.       },
  1015.       "type": "graph",
  1016.       "xaxis": {
  1017.         "buckets": null,
  1018.         "mode": "time",
  1019.         "name": null,
  1020.         "show": true,
  1021.         "values": []
  1022.       },
  1023.       "yaxes": [
  1024.         {
  1025.           "$$hashKey": "object:77",
  1026.           "format": "decbytes",
  1027.           "label": null,
  1028.           "logBase": 1,
  1029.           "max": null,
  1030.           "min": null,
  1031.           "show": true
  1032.         },
  1033.         {
  1034.           "$$hashKey": "object:78",
  1035.           "format": "short",
  1036.           "label": null,
  1037.           "logBase": 1,
  1038.           "max": null,
  1039.           "min": null,
  1040.           "show": true
  1041.         }
  1042.       ],
  1043.       "yaxis": {
  1044.         "align": false,
  1045.         "alignLevel": null
  1046.       }
  1047.     },
  1048.     {
  1049.       "aliasColors": {},
  1050.       "bars": false,
  1051.       "dashLength": 10,
  1052.       "dashes": false,
  1053.       "datasource": "${DS_KAFKAMONITOR}",
  1054.       "description": "监控 kafka.server:type=BrokerTopicMetrics,name=BytesInPerSec 指标",
  1055.       "fill": 1,
  1056.       "fillGradient": 0,
  1057.       "gridPos": {
  1058.         "h": 9,
  1059.         "w": 8,
  1060.         "x": 8,
  1061.         "y": 24
  1062.       },
  1063.       "hiddenSeries": false,
  1064.       "id": 14,
  1065.       "legend": {
  1066.         "alignAsTable": true,
  1067.         "avg": true,
  1068.         "current": true,
  1069.         "max": true,
  1070.         "min": true,
  1071.         "show": true,
  1072.         "total": false,
  1073.         "values": true
  1074.       },
  1075.       "lines": true,
  1076.       "linewidth": 1,
  1077.       "nullPointMode": "null",
  1078.       "options": {
  1079.         "dataLinks": []
  1080.       },
  1081.       "percentage": false,
  1082.       "pointradius": 2,
  1083.       "points": false,
  1084.       "renderer": "flot",
  1085.       "seriesOverrides": [],
  1086.       "spaceLength": 10,
  1087.       "stack": false,
  1088.       "steppedLine": false,
  1089.       "targets": [
  1090.         {
  1091.           "alias": "",
  1092.           "groupBy": [
  1093.             {
  1094.               "params": [
  1095.                 "1m"
  1096.               ],
  1097.               "type": "time"
  1098.             },
  1099.             {
  1100.               "params": [
  1101.                 "hostname"
  1102.               ],
  1103.               "type": "tag"
  1104.             },
  1105.             {
  1106.               "params": [
  1107.                 "null"
  1108.               ],
  1109.               "type": "fill"
  1110.             }
  1111.           ],
  1112.           "measurement": "kafkaServer",
  1113.           "orderByTime": "ASC",
  1114.           "policy": "default",
  1115.           "refId": "F",
  1116.           "resultFormat": "time_series",
  1117.           "select": [
  1118.             [
  1119.               {
  1120.                 "params": [
  1121.                   "OneMinuteRate"
  1122.                 ],
  1123.                 "type": "field"
  1124.               },
  1125.               {
  1126.                 "params": [],
  1127.                 "type": "last"
  1128.               },
  1129.               {
  1130.                 "params": [
  1131.                   "平均每秒进入字节数"
  1132.                 ],
  1133.                 "type": "alias"
  1134.               }
  1135.             ]
  1136.           ],
  1137.           "tags": [
  1138.             {
  1139.               "key": "typeName",
  1140.               "operator": "=",
  1141.               "value": "type=BrokerTopicMetrics,name=BytesInPerSec"
  1142.             }
  1143.           ]
  1144.         }
  1145.       ],
  1146.       "thresholds": [],
  1147.       "timeFrom": null,
  1148.       "timeRegions": [],
  1149.       "timeShift": null,
  1150.       "title": "Kafka每秒进入流量",
  1151.       "tooltip": {
  1152.         "shared": true,
  1153.         "sort": 0,
  1154.         "value_type": "individual"
  1155.       },
  1156.       "type": "graph",
  1157.       "xaxis": {
  1158.         "buckets": null,
  1159.         "mode": "time",
  1160.         "name": null,
  1161.         "show": true,
  1162.         "values": []
  1163.       },
  1164.       "yaxes": [
  1165.         {
  1166.           "$$hashKey": "object:77",
  1167.           "format": "decbytes",
  1168.           "label": null,
  1169.           "logBase": 1,
  1170.           "max": null,
  1171.           "min": null,
  1172.           "show": true
  1173.         },
  1174.         {
  1175.           "$$hashKey": "object:78",
  1176.           "format": "short",
  1177.           "label": null,
  1178.           "logBase": 1,
  1179.           "max": null,
  1180.           "min": null,
  1181.           "show": true
  1182.         }
  1183.       ],
  1184.       "yaxis": {
  1185.         "align": false,
  1186.         "alignLevel": null
  1187.       }
  1188.     },
  1189.     {
  1190.       "aliasColors": {},
  1191.       "bars": false,
  1192.       "dashLength": 10,
  1193.       "dashes": false,
  1194.       "datasource": "${DS_KAFKAMONITOR}",
  1195.       "description": "监控 kafka.server:type=BrokerTopicMetrics,name=TotalFetchRequestsPerSec 和 kafka.server:type=BrokerTopicMetrics,name=TotalProduceRequestsPerSec 指标",
  1196.       "fill": 1,
  1197.       "fillGradient": 0,
  1198.       "gridPos": {
  1199.         "h": 9,
  1200.         "w": 8,
  1201.         "x": 16,
  1202.         "y": 24
  1203.       },
  1204.       "hiddenSeries": false,
  1205.       "id": 20,
  1206.       "legend": {
  1207.         "alignAsTable": true,
  1208.         "avg": true,
  1209.         "current": true,
  1210.         "max": true,
  1211.         "min": true,
  1212.         "show": true,
  1213.         "total": false,
  1214.         "values": true
  1215.       },
  1216.       "lines": true,
  1217.       "linewidth": 1,
  1218.       "nullPointMode": "null",
  1219.       "options": {
  1220.         "dataLinks": []
  1221.       },
  1222.       "percentage": false,
  1223.       "pointradius": 2,
  1224.       "points": false,
  1225.       "renderer": "flot",
  1226.       "seriesOverrides": [],
  1227.       "spaceLength": 10,
  1228.       "stack": false,
  1229.       "steppedLine": false,
  1230.       "targets": [
  1231.         {
  1232.           "alias": "",
  1233.           "groupBy": [
  1234.             {
  1235.               "params": [
  1236.                 "1m"
  1237.               ],
  1238.               "type": "time"
  1239.             },
  1240.             {
  1241.               "params": [
  1242.                 "hostname"
  1243.               ],
  1244.               "type": "tag"
  1245.             },
  1246.             {
  1247.               "params": [
  1248.                 "null"
  1249.               ],
  1250.               "type": "fill"
  1251.             }
  1252.           ],
  1253.           "measurement": "kafkaServer",
  1254.           "orderByTime": "ASC",
  1255.           "policy": "default",
  1256.           "refId": "A",
  1257.           "resultFormat": "time_series",
  1258.           "select": [
  1259.             [
  1260.               {
  1261.                 "params": [
  1262.                   "OneMinuteRate"
  1263.                 ],
  1264.                 "type": "field"
  1265.               },
  1266.               {
  1267.                 "params": [],
  1268.                 "type": "last"
  1269.               },
  1270.               {
  1271.                 "params": [
  1272.                   "每秒Fetch(获取)的请求数量"
  1273.                 ],
  1274.                 "type": "alias"
  1275.               }
  1276.             ]
  1277.           ],
  1278.           "tags": [
  1279.             {
  1280.               "key": "typeName",
  1281.               "operator": "=",
  1282.               "value": "type=BrokerTopicMetrics,name=TotalFetchRequestsPerSec"
  1283.             }
  1284.           ]
  1285.         },
  1286.         {
  1287.           "alias": "",
  1288.           "groupBy": [
  1289.             {
  1290.               "params": [
  1291.                 "1m"
  1292.               ],
  1293.               "type": "time"
  1294.             },
  1295.             {
  1296.               "params": [
  1297.                 "hostname"
  1298.               ],
  1299.               "type": "tag"
  1300.             },
  1301.             {
  1302.               "params": [
  1303.                 "null"
  1304.               ],
  1305.               "type": "fill"
  1306.             }
  1307.           ],
  1308.           "measurement": "kafkaServer",
  1309.           "orderByTime": "ASC",
  1310.           "policy": "default",
  1311.           "refId": "D",
  1312.           "resultFormat": "time_series",
  1313.           "select": [
  1314.             [
  1315.               {
  1316.                 "params": [
  1317.                   "MeanRate"
  1318.                 ],
  1319.                 "type": "field"
  1320.               },
  1321.               {
  1322.                 "params": [],
  1323.                 "type": "last"
  1324.               },
  1325.               {
  1326.                 "params": [
  1327.                   "每秒Producer发送的请求数量"
  1328.                 ],
  1329.                 "type": "alias"
  1330.               }
  1331.             ]
  1332.           ],
  1333.           "tags": [
  1334.             {
  1335.               "key": "typeName",
  1336.               "operator": "=",
  1337.               "value": "type=BrokerTopicMetrics,name=TotalProduceRequestsPerSec"
  1338.             }
  1339.           ]
  1340.         }
  1341.       ],
  1342.       "thresholds": [],
  1343.       "timeFrom": null,
  1344.       "timeRegions": [],
  1345.       "timeShift": null,
  1346.       "title": "Kafka生产、消费每秒请求数量",
  1347.       "tooltip": {
  1348.         "shared": true,
  1349.         "sort": 0,
  1350.         "value_type": "individual"
  1351.       },
  1352.       "type": "graph",
  1353.       "xaxis": {
  1354.         "buckets": null,
  1355.         "mode": "time",
  1356.         "name": null,
  1357.         "show": true,
  1358.         "values": []
  1359.       },
  1360.       "yaxes": [
  1361.         {
  1362.           "$$hashKey": "object:77",
  1363.           "format": "short",
  1364.           "label": null,
  1365.           "logBase": 1,
  1366.           "max": null,
  1367.           "min": null,
  1368.           "show": true
  1369.         },
  1370.         {
  1371.           "$$hashKey": "object:78",
  1372.           "format": "short",
  1373.           "label": null,
  1374.           "logBase": 1,
  1375.           "max": null,
  1376.           "min": null,
  1377.           "show": true
  1378.         }
  1379.       ],
  1380.       "yaxis": {
  1381.         "align": false,
  1382.         "alignLevel": null
  1383.       }
  1384.     },
  1385.     {
  1386.       "aliasColors": {},
  1387.       "bars": false,
  1388.       "dashLength": 10,
  1389.       "dashes": false,
  1390.       "datasource": "${DS_KAFKAMONITOR}",
  1391.       "description": "java.lang:type=Memory",
  1392.       "fill": 1,
  1393.       "fillGradient": 0,
  1394.       "gridPos": {
  1395.         "h": 13,
  1396.         "w": 8,
  1397.         "x": 0,
  1398.         "y": 33
  1399.       },
  1400.       "hiddenSeries": false,
  1401.       "id": 8,
  1402.       "legend": {
  1403.         "alignAsTable": true,
  1404.         "avg": true,
  1405.         "current": true,
  1406.         "max": true,
  1407.         "min": true,
  1408.         "show": true,
  1409.         "total": false,
  1410.         "values": true
  1411.       },
  1412.       "lines": true,
  1413.       "linewidth": 1,
  1414.       "nullPointMode": "null",
  1415.       "options": {
  1416.         "dataLinks": []
  1417.       },
  1418.       "percentage": false,
  1419.       "pointradius": 2,
  1420.       "points": false,
  1421.       "renderer": "flot",
  1422.       "seriesOverrides": [],
  1423.       "spaceLength": 10,
  1424.       "stack": false,
  1425.       "steppedLine": false,
  1426.       "targets": [
  1427.         {
  1428.           "alias": "",
  1429.           "groupBy": [
  1430.             {
  1431.               "params": [
  1432.                 "1m"
  1433.               ],
  1434.               "type": "time"
  1435.             },
  1436.             {
  1437.               "params": [
  1438.                 "hostname"
  1439.               ],
  1440.               "type": "tag"
  1441.             },
  1442.             {
  1443.               "params": [
  1444.                 "null"
  1445.               ],
  1446.               "type": "fill"
  1447.             }
  1448.           ],
  1449.           "measurement": "jvmMemory",
  1450.           "orderByTime": "ASC",
  1451.           "policy": "default",
  1452.           "refId": "E",
  1453.           "resultFormat": "time_series",
  1454.           "select": [
  1455.             [
  1456.               {
  1457.                 "params": [
  1458.                   "HeapMemoryUsage_used"
  1459.                 ],
  1460.                 "type": "field"
  1461.               },
  1462.               {
  1463.                 "params": [],
  1464.                 "type": "last"
  1465.               },
  1466.               {
  1467.                 "params": [
  1468.                   "堆内存使用"
  1469.                 ],
  1470.                 "type": "alias"
  1471.               }
  1472.             ]
  1473.           ],
  1474.           "tags": []
  1475.         }
  1476.       ],
  1477.       "thresholds": [],
  1478.       "timeFrom": null,
  1479.       "timeRegions": [],
  1480.       "timeShift": null,
  1481.       "title": "Kafka使用堆内存",
  1482.       "tooltip": {
  1483.         "shared": true,
  1484.         "sort": 0,
  1485.         "value_type": "individual"
  1486.       },
  1487.       "type": "graph",
  1488.       "xaxis": {
  1489.         "buckets": null,
  1490.         "mode": "time",
  1491.         "name": null,
  1492.         "show": true,
  1493.         "values": []
  1494.       },
  1495.       "yaxes": [
  1496.         {
  1497.           "$$hashKey": "object:1850",
  1498.           "format": "decbytes",
  1499.           "label": null,
  1500.           "logBase": 1,
  1501.           "max": null,
  1502.           "min": null,
  1503.           "show": true
  1504.         },
  1505.         {
  1506.           "$$hashKey": "object:1851",
  1507.           "format": "short",
  1508.           "label": null,
  1509.           "logBase": 1,
  1510.           "max": null,
  1511.           "min": null,
  1512.           "show": true
  1513.         }
  1514.       ],
  1515.       "yaxis": {
  1516.         "align": false,
  1517.         "alignLevel": null
  1518.       }
  1519.     },
  1520.     {
  1521.       "aliasColors": {},
  1522.       "bars": false,
  1523.       "dashLength": 10,
  1524.       "dashes": false,
  1525.       "datasource": "${DS_KAFKAMONITOR}",
  1526.       "description": "java.lang:type=Memory",
  1527.       "fill": 1,
  1528.       "fillGradient": 0,
  1529.       "gridPos": {
  1530.         "h": 13,
  1531.         "w": 8,
  1532.         "x": 8,
  1533.         "y": 33
  1534.       },
  1535.       "hiddenSeries": false,
  1536.       "id": 30,
  1537.       "legend": {
  1538.         "alignAsTable": true,
  1539.         "avg": true,
  1540.         "current": true,
  1541.         "max": true,
  1542.         "min": true,
  1543.         "show": true,
  1544.         "total": false,
  1545.         "values": true
  1546.       },
  1547.       "lines": true,
  1548.       "linewidth": 1,
  1549.       "nullPointMode": "null",
  1550.       "options": {
  1551.         "dataLinks": []
  1552.       },
  1553.       "percentage": false,
  1554.       "pointradius": 2,
  1555.       "points": false,
  1556.       "renderer": "flot",
  1557.       "seriesOverrides": [],
  1558.       "spaceLength": 10,
  1559.       "stack": false,
  1560.       "steppedLine": false,
  1561.       "targets": [
  1562.         {
  1563.           "alias": "",
  1564.           "groupBy": [
  1565.             {
  1566.               "params": [
  1567.                 "1m"
  1568.               ],
  1569.               "type": "time"
  1570.             },
  1571.             {
  1572.               "params": [
  1573.                 "hostname"
  1574.               ],
  1575.               "type": "tag"
  1576.             },
  1577.             {
  1578.               "params": [
  1579.                 "null"
  1580.               ],
  1581.               "type": "fill"
  1582.             }
  1583.           ],
  1584.           "measurement": "jvmMemory",
  1585.           "orderByTime": "ASC",
  1586.           "policy": "default",
  1587.           "refId": "E",
  1588.           "resultFormat": "time_series",
  1589.           "select": [
  1590.             [
  1591.               {
  1592.                 "params": [
  1593.                   "NonHeapMemoryUsage_used"
  1594.                 ],
  1595.                 "type": "field"
  1596.               },
  1597.               {
  1598.                 "params": [],
  1599.                 "type": "last"
  1600.               },
  1601.               {
  1602.                 "params": [
  1603.                   "对外内存使用"
  1604.                 ],
  1605.                 "type": "alias"
  1606.               }
  1607.             ]
  1608.           ],
  1609.           "tags": []
  1610.         }
  1611.       ],
  1612.       "thresholds": [],
  1613.       "timeFrom": null,
  1614.       "timeRegions": [],
  1615.       "timeShift": null,
  1616.       "title": "Kafka使用堆外内存",
  1617.       "tooltip": {
  1618.         "shared": true,
  1619.         "sort": 0,
  1620.         "value_type": "individual"
  1621.       },
  1622.       "type": "graph",
  1623.       "xaxis": {
  1624.         "buckets": null,
  1625.         "mode": "time",
  1626.         "name": null,
  1627.         "show": true,
  1628.         "values": []
  1629.       },
  1630.       "yaxes": [
  1631.         {
  1632.           "$$hashKey": "object:1850",
  1633.           "format": "decbytes",
  1634.           "label": null,
  1635.           "logBase": 1,
  1636.           "max": null,
  1637.           "min": null,
  1638.           "show": true
  1639.         },
  1640.         {
  1641.           "$$hashKey": "object:1851",
  1642.           "format": "short",
  1643.           "label": null,
  1644.           "logBase": 1,
  1645.           "max": null,
  1646.           "min": null,
  1647.           "show": true
  1648.         }
  1649.       ],
  1650.       "yaxis": {
  1651.         "align": false,
  1652.         "alignLevel": null
  1653.       }
  1654.     },
  1655.     {
  1656.       "aliasColors": {},
  1657.       "bars": false,
  1658.       "dashLength": 10,
  1659.       "dashes": false,
  1660.       "datasource": "${DS_KAFKAMONITOR}",
  1661.       "description": "kafka.server:type=ReplicaManager,name=UnderReplicatedPartitions\n不为0则说明有的副本跟不上leader",
  1662.       "fill": 1,
  1663.       "fillGradient": 0,
  1664.       "gridPos": {
  1665.         "h": 13,
  1666.         "w": 8,
  1667.         "x": 16,
  1668.         "y": 33
  1669.       },
  1670.       "hiddenSeries": false,
  1671.       "id": 24,
  1672.       "legend": {
  1673.         "alignAsTable": true,
  1674.         "avg": false,
  1675.         "current": true,
  1676.         "max": true,
  1677.         "min": true,
  1678.         "show": true,
  1679.         "total": false,
  1680.         "values": true
  1681.       },
  1682.       "lines": true,
  1683.       "linewidth": 1,
  1684.       "nullPointMode": "null",
  1685.       "options": {
  1686.         "dataLinks": []
  1687.       },
  1688.       "percentage": false,
  1689.       "pluginVersion": "6.7.3",
  1690.       "pointradius": 2,
  1691.       "points": false,
  1692.       "renderer": "flot",
  1693.       "seriesOverrides": [],
  1694.       "spaceLength": 10,
  1695.       "stack": false,
  1696.       "steppedLine": false,
  1697.       "targets": [
  1698.         {
  1699.           "alias": "",
  1700.           "groupBy": [
  1701.             {
  1702.               "params": [
  1703.                 "1m"
  1704.               ],
  1705.               "type": "time"
  1706.             },
  1707.             {
  1708.               "params": [
  1709.                 "hostname"
  1710.               ],
  1711.               "type": "tag"
  1712.             },
  1713.             {
  1714.               "params": [
  1715.                 "null"
  1716.               ],
  1717.               "type": "fill"
  1718.             }
  1719.           ],
  1720.           "measurement": "underReplicated",
  1721.           "orderByTime": "ASC",
  1722.           "policy": "default",
  1723.           "refId": "A",
  1724.           "resultFormat": "time_series",
  1725.           "select": [
  1726.             [
  1727.               {
  1728.                 "params": [
  1729.                   "Value"
  1730.                 ],
  1731.                 "type": "field"
  1732.               },
  1733.               {
  1734.                 "params": [],
  1735.                 "type": "last"
  1736.               },
  1737.               {
  1738.                 "params": [
  1739.                   "未充分备份的分区数"
  1740.                 ],
  1741.                 "type": "alias"
  1742.               }
  1743.             ]
  1744.           ],
  1745.           "tags": []
  1746.         }
  1747.       ],
  1748.       "thresholds": [],
  1749.       "timeFrom": null,
  1750.       "timeRegions": [],
  1751.       "timeShift": null,
  1752.       "title": "未充分备份的分区数监控",
  1753.       "tooltip": {
  1754.         "shared": true,
  1755.         "sort": 0,
  1756.         "value_type": "individual"
  1757.       },
  1758.       "type": "graph",
  1759.       "xaxis": {
  1760.         "buckets": null,
  1761.         "mode": "time",
  1762.         "name": null,
  1763.         "show": true,
  1764.         "values": []
  1765.       },
  1766.       "yaxes": [
  1767.         {
  1768.           "$$hashKey": "object:11235",
  1769.           "format": "short",
  1770.           "label": null,
  1771.           "logBase": 1,
  1772.           "max": null,
  1773.           "min": null,
  1774.           "show": true
  1775.         },
  1776.         {
  1777.           "$$hashKey": "object:11236",
  1778.           "format": "short",
  1779.           "label": null,
  1780.           "logBase": 1,
  1781.           "max": null,
  1782.           "min": null,
  1783.           "show": true
  1784.         }
  1785.       ],
  1786.       "yaxis": {
  1787.         "align": false,
  1788.         "alignLevel": null
  1789.       }
  1790.     },
  1791.     {
  1792.       "aliasColors": {},
  1793.       "bars": false,
  1794.       "cacheTimeout": null,
  1795.       "dashLength": 10,
  1796.       "dashes": false,
  1797.       "datasource": "${DS_KAFKAMONITOR}",
  1798.       "description": "",
  1799.       "fill": 1,
  1800.       "fillGradient": 0,
  1801.       "gridPos": {
  1802.         "h": 13,
  1803.         "w": 8,
  1804.         "x": 0,
  1805.         "y": 46
  1806.       },
  1807.       "hiddenSeries": false,
  1808.       "id": 12,
  1809.       "legend": {
  1810.         "alignAsTable": true,
  1811.         "avg": false,
  1812.         "current": true,
  1813.         "max": true,
  1814.         "min": true,
  1815.         "show": true,
  1816.         "total": false,
  1817.         "values": true
  1818.       },
  1819.       "lines": true,
  1820.       "linewidth": 1,
  1821.       "links": [],
  1822.       "nullPointMode": "null",
  1823.       "options": {
  1824.         "dataLinks": []
  1825.       },
  1826.       "percentage": false,
  1827.       "pluginVersion": "6.7.3",
  1828.       "pointradius": 2,
  1829.       "points": false,
  1830.       "renderer": "flot",
  1831.       "seriesOverrides": [],
  1832.       "spaceLength": 10,
  1833.       "stack": false,
  1834.       "steppedLine": false,
  1835.       "targets": [
  1836.         {
  1837.           "alias": "",
  1838.           "groupBy": [
  1839.             {
  1840.               "params": [
  1841.                 "5m"
  1842.               ],
  1843.               "type": "time"
  1844.             },
  1845.             {
  1846.               "params": [
  1847.                 "hostname"
  1848.               ],
  1849.               "type": "tag"
  1850.             },
  1851.             {
  1852.               "params": [
  1853.                 "null"
  1854.               ],
  1855.               "type": "fill"
  1856.             }
  1857.           ],
  1858.           "measurement": "network",
  1859.           "orderByTime": "ASC",
  1860.           "policy": "default",
  1861.           "refId": "A",
  1862.           "resultFormat": "time_series",
  1863.           "select": [
  1864.             [
  1865.               {
  1866.                 "params": [
  1867.                   "Value"
  1868.                 ],
  1869.                 "type": "field"
  1870.               },
  1871.               {
  1872.                 "params": [],
  1873.                 "type": "mean"
  1874.               },
  1875.               {
  1876.                 "params": [
  1877.                   "网络线程池空闲比例"
  1878.                 ],
  1879.                 "type": "alias"
  1880.               }
  1881.             ]
  1882.           ],
  1883.           "tags": []
  1884.         }
  1885.       ],
  1886.       "thresholds": [],
  1887.       "timeFrom": null,
  1888.       "timeRegions": [],
  1889.       "timeShift": null,
  1890.       "title": "Kafka网络线程池线程平均的空闲比例",
  1891.       "tooltip": {
  1892.         "shared": true,
  1893.         "sort": 0,
  1894.         "value_type": "individual"
  1895.       },
  1896.       "type": "graph",
  1897.       "xaxis": {
  1898.         "buckets": null,
  1899.         "mode": "time",
  1900.         "name": null,
  1901.         "show": true,
  1902.         "values": []
  1903.       },
  1904.       "yaxes": [
  1905.         {
  1906.           "$$hashKey": "object:13734",
  1907.           "format": "percentunit",
  1908.           "label": null,
  1909.           "logBase": 1,
  1910.           "max": null,
  1911.           "min": null,
  1912.           "show": true
  1913.         },
  1914.         {
  1915.           "$$hashKey": "object:13735",
  1916.           "format": "short",
  1917.           "label": null,
  1918.           "logBase": 1,
  1919.           "max": null,
  1920.           "min": null,
  1921.           "show": true
  1922.         }
  1923.       ],
  1924.       "yaxis": {
  1925.         "align": false,
  1926.         "alignLevel": null
  1927.       }
  1928.     },
  1929.     {
  1930.       "aliasColors": {},
  1931.       "bars": false,
  1932.       "cacheTimeout": null,
  1933.       "dashLength": 10,
  1934.       "dashes": false,
  1935.       "datasource": "${DS_KAFKAMONITOR}",
  1936.       "description": "kafka.server:type=KafkaRequestHandlerPool,name=RequestHandlerAvgIdlePercent",
  1937.       "fill": 1,
  1938.       "fillGradient": 0,
  1939.       "gridPos": {
  1940.         "h": 13,
  1941.         "w": 8,
  1942.         "x": 8,
  1943.         "y": 46
  1944.       },
  1945.       "hiddenSeries": false,
  1946.       "id": 22,
  1947.       "legend": {
  1948.         "alignAsTable": true,
  1949.         "avg": false,
  1950.         "current": true,
  1951.         "max": true,
  1952.         "min": true,
  1953.         "show": true,
  1954.         "total": false,
  1955.         "values": true
  1956.       },
  1957.       "lines": true,
  1958.       "linewidth": 1,
  1959.       "links": [],
  1960.       "nullPointMode": "null",
  1961.       "options": {
  1962.         "dataLinks": []
  1963.       },
  1964.       "percentage": false,
  1965.       "pluginVersion": "6.7.3",
  1966.       "pointradius": 2,
  1967.       "points": false,
  1968.       "renderer": "flot",
  1969.       "seriesOverrides": [],
  1970.       "spaceLength": 10,
  1971.       "stack": false,
  1972.       "steppedLine": false,
  1973.       "targets": [
  1974.         {
  1975.           "alias": "",
  1976.           "groupBy": [
  1977.             {
  1978.               "params": [
  1979.                 "1m"
  1980.               ],
  1981.               "type": "time"
  1982.             },
  1983.             {
  1984.               "params": [
  1985.                 "hostname"
  1986.               ],
  1987.               "type": "tag"
  1988.             }
  1989.           ],
  1990.           "measurement": "network",
  1991.           "orderByTime": "ASC",
  1992.           "policy": "default",
  1993.           "refId": "A",
  1994.           "resultFormat": "time_series",
  1995.           "select": [
  1996.             [
  1997.               {
  1998.                 "params": [
  1999.                   "OneMinuteRate"
  2000.                 ],
  2001.                 "type": "field"
  2002.               },
  2003.               {
  2004.                 "params": [],
  2005.                 "type": "last"
  2006.               },
  2007.               {
  2008.                 "params": [
  2009.                   "IO空闲比例"
  2010.                 ],
  2011.                 "type": "alias"
  2012.               }
  2013.             ]
  2014.           ],
  2015.           "tags": [
  2016.             {
  2017.               "key": "typeName",
  2018.               "operator": "=",
  2019.               "value": "type=KafkaRequestHandlerPool,name=RequestHandlerAvgIdlePercent"
  2020.             }
  2021.           ]
  2022.         }
  2023.       ],
  2024.       "thresholds": [],
  2025.       "timeFrom": null,
  2026.       "timeRegions": [],
  2027.       "timeShift": null,
  2028.       "title": " I/O 线程池线程平均的空闲比例",
  2029.       "tooltip": {
  2030.         "shared": true,
  2031.         "sort": 0,
  2032.         "value_type": "individual"
  2033.       },
  2034.       "type": "graph",
  2035.       "xaxis": {
  2036.         "buckets": null,
  2037.         "mode": "time",
  2038.         "name": null,
  2039.         "show": true,
  2040.         "values": []
  2041.       },
  2042.       "yaxes": [
  2043.         {
  2044.           "$$hashKey": "object:13517",
  2045.           "format": "percentunit",
  2046.           "label": null,
  2047.           "logBase": 1,
  2048.           "max": null,
  2049.           "min": null,
  2050.           "show": true
  2051.         },
  2052.         {
  2053.           "$$hashKey": "object:13518",
  2054.           "format": "short",
  2055.           "label": null,
  2056.           "logBase": 1,
  2057.           "max": null,
  2058.           "min": null,
  2059.           "show": true
  2060.         }
  2061.       ],
  2062.       "yaxis": {
  2063.         "align": false,
  2064.         "alignLevel": null
  2065.       }
  2066.     },
  2067.     {
  2068.       "aliasColors": {},
  2069.       "bars": false,
  2070.       "dashLength": 10,
  2071.       "dashes": false,
  2072.       "datasource": "${DS_KAFKAMONITOR}",
  2073.       "description": "监控 kafka.server:type=BrokerTopicMetrics,name=FailedFetchRequestsPerSec 和 kafka.server:type=BrokerTopicMetrics,name=TotalFetchRequestsPerSec 指标",
  2074.       "fill": 1,
  2075.       "fillGradient": 0,
  2076.       "gridPos": {
  2077.         "h": 13,
  2078.         "w": 8,
  2079.         "x": 16,
  2080.         "y": 46
  2081.       },
  2082.       "hiddenSeries": false,
  2083.       "id": 18,
  2084.       "legend": {
  2085.         "alignAsTable": true,
  2086.         "avg": true,
  2087.         "current": true,
  2088.         "max": true,
  2089.         "min": true,
  2090.         "show": true,
  2091.         "total": false,
  2092.         "values": true
  2093.       },
  2094.       "lines": true,
  2095.       "linewidth": 1,
  2096.       "nullPointMode": "null",
  2097.       "options": {
  2098.         "dataLinks": []
  2099.       },
  2100.       "percentage": false,
  2101.       "pointradius": 2,
  2102.       "points": false,
  2103.       "renderer": "flot",
  2104.       "seriesOverrides": [],
  2105.       "spaceLength": 10,
  2106.       "stack": false,
  2107.       "steppedLine": false,
  2108.       "targets": [
  2109.         {
  2110.           "alias": "",
  2111.           "groupBy": [
  2112.             {
  2113.               "params": [
  2114.                 "1m"
  2115.               ],
  2116.               "type": "time"
  2117.             },
  2118.             {
  2119.               "params": [
  2120.                 "hostname"
  2121.               ],
  2122.               "type": "tag"
  2123.             },
  2124.             {
  2125.               "params": [
  2126.                 "null"
  2127.               ],
  2128.               "type": "fill"
  2129.             }
  2130.           ],
  2131.           "measurement": "kafkaServer",
  2132.           "orderByTime": "ASC",
  2133.           "policy": "default",
  2134.           "refId": "H",
  2135.           "resultFormat": "time_series",
  2136.           "select": [
  2137.             [
  2138.               {
  2139.                 "params": [
  2140.                   "OneMinuteRate"
  2141.                 ],
  2142.                 "type": "field"
  2143.               },
  2144.               {
  2145.                 "params": [],
  2146.                 "type": "last"
  2147.               },
  2148.               {
  2149.                 "params": [
  2150.                   "每秒Fetch(获取)异常的请求"
  2151.                 ],
  2152.                 "type": "alias"
  2153.               }
  2154.             ]
  2155.           ],
  2156.           "tags": [
  2157.             {
  2158.               "key": "typeName",
  2159.               "operator": "=",
  2160.               "value": "type=BrokerTopicMetrics,name=FailedFetchRequestsPerSec"
  2161.             }
  2162.           ]
  2163.         },
  2164.         {
  2165.           "alias": "",
  2166.           "groupBy": [
  2167.             {
  2168.               "params": [
  2169.                 "1m"
  2170.               ],
  2171.               "type": "time"
  2172.             },
  2173.             {
  2174.               "params": [
  2175.                 "hostname"
  2176.               ],
  2177.               "type": "tag"
  2178.             },
  2179.             {
  2180.               "params": [
  2181.                 "null"
  2182.               ],
  2183.               "type": "fill"
  2184.             }
  2185.           ],
  2186.           "measurement": "kafkaServer",
  2187.           "orderByTime": "ASC",
  2188.           "policy": "default",
  2189.           "refId": "J",
  2190.           "resultFormat": "time_series",
  2191.           "select": [
  2192.             [
  2193.               {
  2194.                 "params": [
  2195.                   "MeanRate"
  2196.                 ],
  2197.                 "type": "field"
  2198.               },
  2199.               {
  2200.                 "params": [],
  2201.                 "type": "last"
  2202.               },
  2203.               {
  2204.                 "params": [
  2205.                   "每秒Producer异常的请求"
  2206.                 ],
  2207.                 "type": "alias"
  2208.               }
  2209.             ]
  2210.           ],
  2211.           "tags": [
  2212.             {
  2213.               "key": "typeName",
  2214.               "operator": "=",
  2215.               "value": "type=BrokerTopicMetrics,name=FailedProduceRequestsPerSec"
  2216.             }
  2217.           ]
  2218.         }
  2219.       ],
  2220.       "thresholds": [],
  2221.       "timeFrom": null,
  2222.       "timeRegions": [],
  2223.       "timeShift": null,
  2224.       "title": "Kafka生产、消费请求失败数量",
  2225.       "tooltip": {
  2226.         "shared": true,
  2227.         "sort": 0,
  2228.         "value_type": "individual"
  2229.       },
  2230.       "type": "graph",
  2231.       "xaxis": {
  2232.         "buckets": null,
  2233.         "mode": "time",
  2234.         "name": null,
  2235.         "show": true,
  2236.         "values": []
  2237.       },
  2238.       "yaxes": [
  2239.         {
  2240.           "$$hashKey": "object:77",
  2241.           "format": "short",
  2242.           "label": null,
  2243.           "logBase": 1,
  2244.           "max": null,
  2245.           "min": null,
  2246.           "show": true
  2247.         },
  2248.         {
  2249.           "$$hashKey": "object:78",
  2250.           "format": "short",
  2251.           "label": null,
  2252.           "logBase": 1,
  2253.           "max": null,
  2254.           "min": null,
  2255.           "show": true
  2256.         }
  2257.       ],
  2258.       "yaxis": {
  2259.         "align": false,
  2260.         "alignLevel": null
  2261.       }
  2262.     }
  2263.   ],
  2264.   "refresh": false,
  2265.   "schemaVersion": 22,
  2266.   "style": "dark",
  2267.   "tags": [],
  2268.   "templating": {
  2269.     "list": []
  2270.   },
  2271.   "time": {
  2272.     "from": "now-1h",
  2273.     "to": "now"
  2274.   },
  2275.   "timepicker": {
  2276.     "refresh_intervals": [
  2277.       "5s",
  2278.       "10s",
  2279.       "30s",
  2280.       "1m",
  2281.       "5m",
  2282.       "15m",
  2283.       "30m",
  2284.       "1h",
  2285.       "2h",
  2286.       "1d"
  2287.     ]
  2288.   },
  2289.   "timezone": "",
  2290.   "title": "Kafka集群监控模板",
  2291.   "uid": "PkULDneZkALL",
  2292.   "variables": {
  2293.     "list": []
  2294.   },
  2295.   "version": 27
  2296. }

5、docker-compose.yml文件

将InfluxDB、JMXTrans、Grafana部署整合使用Docker-Compose进行部署,创建KafkaMonitor目录,在KafkaMonitor目录内创建influxdb目录和jmxtrans目录以及docker-compose.yml文件,将jmxtrans.json文件放到jmxtrans目录。

docker-compose.yml文件如下:

  1. version: '2'
  2. services:
  3.   # JMXTrans服务
  4.   jmxtrans:
  5.     image: jmxtrans/jmxtrans
  6.     container_name: jmxtrans
  7.     volumes:
  8.       - ./jmxtrans:/var/lib/jmxtrans
  9.   # InfluxDB服务
  10.   influxdb:
  11.     image: influxdb
  12.     container_name: influxdb
  13.     volumes:
  14.       - ./influxdb/conf:/etc/influxdb
  15.       - ./influxdb/data:/var/lib/influxdb/data
  16.       - ./influxdb/meta:/var/lib/influxdb/meta
  17.       - ./influxdb/wal:/var/lib/influxdb/wal
  18.     ports:
  19.       - "8086:8086" # 对外暴露端口,提供Grafana访问
  20.     restart: always
  21.   # Grafana服务
  22.   grafana:
  23.     image: grafana/grafana:6.5.0  #高版本可能存在bug
  24.     container_name: grafana
  25.     ports:
  26.       - "3000:3000"  # 对外暴露端口,提供web访问

启动监控框架服务:

docker-compose -f docker-compose.yml up -d

需要Web登录Grafana服务,配置相应的数据源和模板。

6、监控查看

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/你好赵伟/article/detail/712138
推荐阅读
相关标签
  

闽ICP备14008679号