赞
踩
名称 | 地址 | 说明 |
---|---|---|
官方采集组件 | https://prometheus.io/download/ | 包含prometheus、alertmanager、blackbox_exporter、consul_exporter、graphite_exporter、haproxy_exporter、memcached_exporter、mysqld_exporter、node_exporter、pushgateway、statsd_exporter等组件 |
其他采集组件 | https://prometheus.io/docs/instrumenting/exporters/ | 上面不包含的组件都可以来这里查找 |
grafana dashboards json模板 | https://grafana.com/grafana/dashboards?search=kafka | 搜索grafana图形化展示prometheus监控数据模板 |
prometheus中文文档1 | https://www.prometheus.wang/exporter/use-promethues-monitor-mysql.html | |
prometheus中文文档2 | https://prometheus.fuckcloudnative.io/di-san-zhang-prometheus/storage |
名称 | 版本 | 下载链接 | 描述 |
---|---|---|---|
操作系统 | centos7.8 | ||
Prometheus | prometheus-2.24.0.linux-amd64.tar.gz | https://prometheus.io/download/ | 普罗米修斯服务 |
go | 1.11.4 | https://golang.org/dl/ | go环境 |
Grafana | 7.3.7-1 | wget https://dl.grafana.com/oss/release/grafana-7.3.7-1.x86_64.rpm | 普罗米修斯监控数据图形化组件 |
alertmanager | alertmanager-0.21.0.linux-amd64.tar.gz | https://prometheus.io/download/ | 普罗米修斯告警组件 |
blackbox_exporter | blackbox_exporter-0.18.0.linux-amd64.tar.gz | https://prometheus.io/download/ | 黑盒探测组件 |
mysqld_exporter | mysqld_exporter-0.12.1.linux-amd64.tar.gz | https://prometheus.io/download/ | mysql监控组件 |
node_exporter | node_exporter-1.0.1.linux-amd64.tar.gz | https://github.com/prometheus/node_exporter/releases/download/v1.0.1/node_exporter-1.0.1.linux-amd64.tar.gz | 服务器资源监控组件 |
redis_exporter | redis_exporter-v1.15.0.linux-amd64.tar.gz | https://github.com/oliver006/redis_exporter/releases/download/v1.15.0/redis_exporter-v1.15.0.linux-amd64.tar.gz | redis监控组件 |
elasticsearch_exporter | https://github.com/justwatchcom/elasticsearch_exporter/releases/download/v1.1.0/elasticsearch_exporter-1.1.0.linux-amd64.tar.gz | es监控组件 | |
kafka_exporter | https://github.com/danielqsj/kafka_exporter/releases/download/v1.2.0/kafka_exporter-1.2.0.linux-amd64.tar.gz | kafka采集组件 | |
nginx-module-vts插件 | git://github.com/vozlt/nginx-module-vts.git | nginx-module-vts插件暴露nginx监控数据给普罗米修斯 | |
nginx-vts-exporter | https://github.com/hnlq715/nginx-vts-exporter/releases/download/v0.9.1/nginx-vts-exporter-0.9.1.linux-amd64.tar.gz | nginx-vts采集组件 | |
grafana-piechart-panel | https://grafana.com/api/plugins/grafana-piechart-panel/versions/latest/download | grafana饼图插件 |
Prometheus 是一个开源的服务监控系统和时间序列数据库。
Prometheus的基本原理是通过HTTP协议周期性抓取被监控组件的状态,任意组件只要提供对应的HTTP接口就可以接入监控。不需要任何SDK或者其他的集成过程。这样做非常适合做虚拟化环境监控系统,比如VM、Docker、Kubernetes等。输出被监控组件信息的HTTP接口被叫做exporter 。目前互联网公司常用的组件大部分都有exporter可以直接使用,比如Varnish、Haproxy、Nginx、MySQL、Linux系统信息(包括磁盘、内存、CPU、网络等等)。
tar -C /usr/local/ -xvf go1.11.4.linux-amd64.tar.gz
vim /etc/profile
------------------------------------------------------------------
export PATH=$PATH:/usr/local/go/bin
------------------------------------------------------------------
//配置生效
source /etc/profile
[root@iZ2zejaz33icbod2k4cvy6Z ~]# go version
go version go1.11.5 linux/amd64
# 阿里云 docker hub 镜像 export REGISTRY_MIRROR=https://registry.cn-hangzhou.aliyuncs.com # 卸载旧版本 yum remove -y docker \ docker-client \ docker-client-latest \ docker-ce-cli \ docker-common \ docker-latest \ docker-latest-logrotate \ docker-logrotate \ docker-selinux \ docker-engine-selinux \ docker-engine # 设置 yum repository yum install -y yum-utils \ device-mapper-persistent-data \ lvm2 yum-config-manager --add-repo http://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo # 安装并启动 docker yum install -y docker-ce-19.03.11 docker-ce-cli-19.03.11 containerd.io-1.2.13 mkdir /etc/docker || true cat > /etc/docker/daemon.json <<EOF { "registry-mirrors": ["${REGISTRY_MIRROR}"], "exec-opts": ["native.cgroupdriver=systemd"], "log-driver": "json-file", "log-opts": { "max-size": "100m" }, "storage-driver": "overlay2", "storage-opts": [ "overlay2.override_kernel_check=true" ] } EOF mkdir -p /etc/systemd/system/docker.service.d # Restart Docker systemctl daemon-reload systemctl enable docker systemctl restart docker
mv docker-compose-Linux-x86_64 /usr/local/bin/docker-compose
chomd -R 777 /usr/local/bin/docker-compose
//校验
docker-compose -v
------------------------------------------------------------------------------------------
docker-compose version 1.27.4, build 40524192
-----------------------------------------------------------------------------------------
tar -zxvf prometheus-2.24.0.linux-amd64.tar.gz
mv prometheus-2.24.0.linux-amd64 /usr/local/Prometheus
nohup /usr/local/Prometheus/prometheus --config.file=/usr/local/Prometheus/prometheus.yml &
普罗米修斯默认的页面可能没有那么直观,我们可以安装grafana使监控看起来更直观
wget https://dl.grafana.com/oss/release/grafana-7.3.7-1.x86_64.rpm
sudo yum install grafana-7.3.7-1.x86_64.rpm
sudo /bin/systemctl daemon-reload
sudo /bin/systemctl enable grafana-server.service
sudo /bin/systemctl start grafana-server.service
grafana默认无法展示饼图,所以需要下载安装饼图插件piechart
// 查看已安装插件列表 [root@bogon sbin]# /usr/sbin/grafana-cli plugins ls ------------------------------------------------------------------------------------------------------ Restart grafana after installing plugins . <service grafana-server restart> ----------------------------------------------------------------------------------------------------- //解压插件包到对应目录下,重启grafana服务 [root@bogon resources]# unzip grafana-piechart-panel-5f249d5.zip [root@bogon resources]# mv grafana-piechart-panel-5f249d5 /var/lib/grafana/plugins/ [root@bogon resources]# service grafana-server restart Restarting grafana-server (via systemctl): [ 确定 ] [root@bogon resources]# /usr/sbin/grafana-cli plugins ls ------------------------------------------------------------------------------------ installed plugins: grafana-piechart-panel @ 1.3.3 Restart grafana after installing plugins . <service grafana-server restart> ------------------------------------------------------------------------------------
[root@bogon plugins]# /usr/sbin/grafana-cli plugins install grafana-piechart-panel installing grafana-piechart-panel @ 1.6.1 from: https://grafana.com/api/plugins/grafana-piechart-panel/versions/1.6.1/download into: /var/lib/grafana/plugins ✔ Installed grafana-piechart-panel successfully Restart grafana after installing plugins . <service grafana-server restart> [root@bogon plugins]# service grafana-server restart Restarting grafana-server (via systemctl): [ 确定 ] [root@bogon plugins]# /usr/sbin/grafana-cli plugins ls installed plugins: grafana-piechart-panel @ 1.6.1 Restart grafana after installing plugins . <service grafana-server restart>
Alertmanager是一个独立的告警模块,接收Prometheus等客户端发来的警报,之后通过分组、删除重复等处理,并将它们通过路由发送给正确的接收器;告警方式可以按照不同的规则发送给不同的模块负责人,Alertmanager支持Email, Slack,等告警方式, 也可以通过webhook接入钉钉等国内IM工具。
tar -zxvf alertmanager-0.21.2.linux-amd64.tar.gz
mv /opt/resource/alertmanager-0.21.2.linux-amd64 /usr/local/
vi /usr/local/alertmanager-0.21.2.linux-amd64/alertmanager.yml ------------------------------------------------------------------------------- global: smtp_smarthost: 'smtp.qiye.163.com:25' # smtp地址 smtp_from: 'xxx.com' # 谁发邮件 smtp_auth_username: 'xxx.com' # 邮箱用户 smtp_auth_password: 'xxx' # 邮箱密码 smtp_require_tls: false route: receiver: email receivers: - name: 'email' email_configs: - to: 'xxx.com' -------------------------------------------------------------------------------
vi /usr/local/Prometheus/prometheus.yml
--------------------------------------------------------------------------------
- job_name: 'alertmanager' #告警job配置
static_configs:
- targets: ['xxx:9093'] #alertmanager默认是9093端口
--------------------------------------------------------------------------------
#启动alertmanager服务
nohup /usr/local/alertmanager-0.21.0.linux-amd64/bin/alertmanager --config.file="/usr/local/alertmanager-0.18.0.linux-amd64/config/alertmanager.yml" &
先 ps -ef|grep prometheus, kill掉prometheus进程然后执行下面命令启动服务
#重启prometheus
nohup /usr/local/Prometheus/prometheus --config.file=/usr/local/Prometheus/prometheus.yml &
#Prometheus数据源的配置主要分为静态配置和动态发现, 常用的为以下几类:
1)static_configs: #静态服务发现
2)file_sd_configs: #文件服务发现
3)dns_sd_configs: DNS #服务发现
4)kubernetes_sd_configs: #Kubernetes 服务发现
5)consul_sd_configs: Consul #服务发现
#在监控kubernetes的应用场景中,频繁更新的pod,svc,等等资源配置应该是最能体现Prometheus监控目标自动发现服务的好处
在没有使用 consul 服务自动发现的时候,我们需要频繁对 Prometheus 配置文件进行修改,无疑给运维人员带来很大的负担。引入consul之后,只需要在consul中维护监控组件配置,prometheus就能够动态发现配置了。
mkdir -p /data0/consul
cat > /data0/consul/docker-compose.yaml << \EOF version: '2' networks: byfn: services: consul1: image: consul container_name: node1 volumes: - /data0/consul/conf_with_acl:/consul/config - /data0/consul/consul/node1:/consul/data command: agent -server -bootstrap-expect=3 -node=node1 -bind=0.0.0.0 -client=0.0.0.0 -config-dir=/consul/config networks: - byfn consul2: image: consul container_name: node2 volumes: - /data0/consul/conf_with_acl:/consul/config - /data0/consul/consul/node2:/consul/data command: agent -server -retry-join=node1 -node=node2 -bind=0.0.0.0 -client=0.0.0.0 -config-dir=/consul/config ports: - 8500:8500 depends_on: - consul1 networks: - byfn consul3: image: consul volumes: - /data0/consul/conf_with_acl:/consul/config - /data0/consul/consul/node3:/consul/data container_name: node3 command: agent -server -retry-join=node1 -node=node3 -bind=0.0.0.0 -client=0.0.0.0 -config-dir=/consul/config depends_on: - consul1 networks: - byfn consul4: image: consul container_name: node4 volumes: - /data0/consul/conf_with_acl:/consul/config - /data0/consul/consul/node4:/consul/data command: agent -retry-join=node1 -node=ndoe4 -bind=0.0.0.0 -client=0.0.0.0 -ui -config-dir=/consul/config ports: - 8501:8500 depends_on: - consul2 - consul3 networks: - byfn consul5: image: consul container_name: node5 volumes: - /data0/consul/conf_without_acl:/consul/config - /data0/consul/consul/node5:/consul/data command: agent -retry-join=node1 -node=ndoe5 -bind=0.0.0.0 -client=0.0.0.0 -ui -config-dir=/consul/config ports: - 8502:8500 depends_on: - consul2 - consul3 networks: - byfn EOF
cd /data0/consul/
docker-compose up -d
# 注册服务
curl -X PUT -d '{"id": "redis","name": "redis","address": "182.92.219.202","port": 9121,"tags": ["service"],"checks": [{"http": "http://182.92.219.202:9121/","interval": "5s"}]}' http://182.92.219.202:8502/v1/agent/service/register
# 查询指定节点以及指定的服务信息
[root@iZ2zejaz33icbod2k4cvy6Z ~]# curl http://182.92.219.202:8500/v1/catalog/service/redis
[{"ID":"9d76becb-c557-e605-de13-a906ef32497c","Node":"ndoe5","Address":"172.20.0.6","Datacenter":"dc1","TaggedAddresses":{"lan":"172.20.0.6","lan_ipv4":"172.20.0.6","wan":"172.20.0.6","wan_ipv4":"172.20.0.6"},"NodeMeta":{"consul-network-segment":""},"ServiceKind":"","ServiceID":"redis","ServiceName":"redis","ServiceTags":["service"],"ServiceAddress":"182.92.219.202","ServiceTaggedAddresses":{"lan_ipv4":{"Address":"182.92.219.202","Port":9121},"wan_ipv4":{"Address":"182.92.219.202","Port":9121}},"ServiceWeights":{"Passing":1,"Warning":1},"ServiceMeta":{},"ServicePort":9121,"ServiceEnableTagOverride":false,"ServiceProxy":{"MeshGateway":{},"Expose":{}},"ServiceConnect":{},"CreateIndex":458,"ModifyIndex":458}][root@iZ2zejaz33icbod2k4cvy6Z ~]#
#删除指定服务 redis为要删除服务的id
curl -X PUT http://182.92.219.202:8502/v1/agent/service/deregister/redis
vi prometheus.yml
--------------------------------------------------------------------------------------------------------------------------------------------------------------
#新增如下配置
- job_name: 'consul-prometheus'
consul_sd_configs:
- server: '182.92.219.202:8502'
services: []
---------------------------------------------------------------------------------------------------------------------------------------------------------------
#重启prometheus
nohup /usr/local/Prometheus/prometheus --config.file=/usr/local/Prometheus/prometheus.yml &
下文普罗米修斯增加监控实例都采用consul动态发现配置,就不使用修改prometheus.yaml然后重启普罗米修斯方式了
tar -C /usr/local/ -xvf mysqld_exporter-0.12.1.linux-amd64.tar.gz
vi /usr/local/mysqld_exporter-0.12.1.linux-amd64/.my.cnf
----------------------------------------------------------------------
[client]
user=root #mysql数据库账号
password=123456 #mysql数据库密码
--------------------------------------------------------------------
nohup /usr/local/mysqld_exporter-0.12.1.linux-amd64/mysqld_exporter --config.my-cnf="/usr/local/mysqld_exporter-0.12.1.linux-amd64/.my.cnf" &
//去consul服务器上执行命令:
curl -X PUT -d '{"id": "192.168.100.132_mysql","name": "192.168.100.132_mysql","address": "192.168.100.132","port": 9104,"tags": ["service"],"checks": [{"http": "http://192.168.100.132:9104/","interval": "5s"}]}' http://192.168.100.175:8502/v1/agent/service/register
grafana配置mysql监控可视化图形
1)grafana界面添加mysql数据源
2)add data source
导入dashboard json模板,模板id:7362
6) prometheus告警规则配置
配置告警规则文件目录
vi /usr/local/Prometheus/prometheus.yml
---------------------------------------------------------------------
rule_files:
- /usr/local/Prometheus/conf/rule/*.yml #告警规则配置放入该目录下
---------------------------------------------------------------------
配置具体告警规则
vi /usr/local/Prometheus/conf/rule/mysql.yml ---------------------------------------------------------------------------------------- groups: - name: mysql_alert rules: ### 慢查询 ### # 默认慢查询告警策略 - alert: mysql慢查询5分钟100条 expr: floor(delta(mysql_global_status_slow_queries{mysql_addr!~"10.8.6.44:3306|10.8.9.20:3306|10.8.12.212:3306"}[5m])) >= 100 for: 3m labels: severity: warning annotations: description: "[{{ $labels.group }}_{{ $labels.role }}],地址:[{{ $labels.mysql_addr }}],告警值为:[{{ $value }}条],告警初始时长为3分钟." ### qps ### # 默认qps告警策略 - alert: mysql_qps大于8000 expr: floor(sum(irate(mysql_global_status_commands_total{group!~"product|product_backend"}[5m])) by (group, role, mysql_addr)) > 8000 for: 6m labels: severity: warning annotations: description: "[{{ $labels.group }}_{{ $labels.role }}],地址:[{{ $labels.mysql_addr }}],告警值为:[{{ $value }}],告警初始时长为6分钟." # 商品库等qps告警策略 - alert: mysql_qps大于25000 expr: floor(sum(irate(mysql_global_status_commands_total{group=~"product|product_backend"}[5m])) by (group, role, mysql_addr)) > 25000 for: 3m labels: severity: warning annotations: description: "[{{ $labels.group }}_{{ $labels.role }}],地址:[{{ $labels.mysql_addr }}],告警值为:[{{ $value }}],告警初始时长为3分钟." ### 内存 ### # 默认内存告警策略 - alert: mysql内存99% expr: mysql_mem_used_rate >= 99 for: 6m labels: severity: warning annotations: description: "[{{ $labels.group }}_{{ $labels.role }}],地址:[{{ $labels.mysql_addr }}],告警值为:[{{ $value }}%],告警初始时长为6分钟." ### 磁盘 ### # 默认磁盘告警策略 - alert: mysql磁盘85% expr: mysql_disk_used_rate{mysql_addr!~"10.8.161.53:3306|10.8.115.31:3306"} >= 85 for: 3m labels: severity: warning annotations: description: "[{{ $labels.group }}_{{ $labels.role }}],地址:[{{ $labels.mysql_addr }}],告警值为:[{{ $value }}%],告警初始时长为3分钟." # 磁盘95%告警策略 - alert: mysql磁盘95% expr: mysql_disk_used_rate{mysql_addr=~"10.8.161.53:3306|10.8.115.31:3306"} >= 95 for: 3m labels: severity: warning annotations: description: "[{{ $labels.group }}_{{ $labels.role }}],地址:[{{ $labels.mysql_addr }}],告警值为:[{{ $value }}%],告警初始时长为3分钟." #### IO上限告警 ### ## SSD盘IO上限告警策略 # - alert: mysqlSSD盘IO上限预警 # expr: (floor(mysql_ioops) >= mysql_disk_total_size * 50 * 0.9) and (mysql_ssd == 1) and on() hour() >= 0 < 16 # for: 6m # labels: # severity: warning # annotations: # description: "[{{ $labels.group }}_{{ $labels.role }}],地址:[{{ $labels.mysql_addr }}],告警值为:[{{ $value }}],告警初始时长为6分钟." ## 普通盘IO上限告警策略 # - alert: mysql普通盘IO上限预警 # expr: (floor(mysql_ioops) >= mysql_disk_total_size * 10 * 0.9) and (mysql_ssd == 0) and on() hour() >= 0 < 16 # for: 6m # labels: # severity: warning # annotations: # description: "[{{ $labels.group }}_{{ $labels.role }}],地址:[{{ $labels.mysql_addr }}],告警值为:[{{ $value }}],告警初始时长为6分钟." ### 连接数 ### # 默认连接数告警策略 - alert: mysql连接数80% expr: floor(mysql_global_status_threads_connected / mysql_global_variables_max_connections * 100) >= 80 for: 3m labels: severity: warning annotations: description: "[{{ $labels.group }}_{{ $labels.role }}],地址:[{{ $labels.mysql_addr }}],告警值为:[{{ $value }}%],告警初始时长为3分钟." ### 运行进程数 ### # 默认运行进程数告警策略 - alert: mysql运行进程数5分钟增长>150 expr: floor(delta(mysql_global_status_threads_running{mysql_addr!~"10.8.136.10:3306|10.10.129.116:3306|10.8.67.153:3306"}[5m])) >= 150 for: 3m labels: severity: warning annotations: description: "[{{ $labels.group }}_{{ $labels.role }}],地址:[{{ $labels.mysql_addr }}],告警值为:[{{ $value }}],告警初始时长为3分钟." # 6分钟运行进程数告警策略 - alert: mysql运行进程数5分钟增长>150 expr: floor(delta(mysql_global_status_threads_runningi{mysql_addr=~"10.8.136.10:3306|10.10.129.116:3306|10.8.67.153:3306"}[5m])) >= 150 for: 6m labels: severity: warning annotations: description: "[{{ $labels.group }}_{{ $labels.role }}],地址:[{{ $labels.mysql_addr }}],告警值为:[{{ $value }}],告警初始时长为6分钟." ### 主从同步异常 ### # 默认主从同步告警策略 - alert: mysql主从同步异常 expr: (mysql_slave_status_slave_io_running{role!="master"} == 0) or (mysql_slave_status_slave_sql_running{role!="master"} == 0) for: 1m labels: severity: warning annotations: description: "[{{ $labels.group }}_{{ $labels.role }}],地址:[{{ $labels.mysql_addr }}],主从同步异常,告警初始时长为1分钟." ### 主从同步延时 ### # 默认主从同步延时告警策略 - alert: mysql主从同步延时>30s expr: floor(mysql_slave_status_seconds_behind_master{mysql_addr!~"10.8.137.173:3306|10.8.11.17:3306|10.8.2.17:3306|10.10.29.6:3306|10.8.61.153:3306"}) >= 30 for: 3m labels: severity: warning annotations: description: "[{{ $labels.group }}_{{ $labels.role }}],地址:[{{ $labels.mysql_addr }}],告警值为:[{{ $value }}s],告警初始时长为3分钟." # 主从同步延时较大告警策略 - alert: mysql主从同步延时>300s expr: floor(mysql_slave_status_seconds_behind_master{mysql_addr=~"10.8.137.173:3306|10.8.11.17:3306|10.10.29.6:3306|10.8.61.153:3306"}) >= 300 for: 12m labels: severity: warning annotations: description: "[{{ $labels.group }}_{{ $labels.role }}],地址:[{{ $labels.mysql_addr }}],告警值为:[{{ $value }}s],告警初始时长为12分钟." -------------------------------------------------------------------
prometheus 查看告警规则:
blackbox_exporter 是 Prometheus 拿来对 http/https、tcp、icmp、dns、进行的黑盒监控工具
下载地址:https://github.com/prometheus/blackbox_exporter/releases/download/v0.18.0/blackbox_exporter-0.18.0.linux-amd64.tar.gz
wget https://github.com/prometheus/blackbox_exporter/releases/download/v0.18.0/blackbox_exporter-0.18.0.linux-amd64.tar.gz
tar -C /usr/local/ -xvf blackbox_exporter-0.18.0.linux-amd64.tar.gz
cat /usr/local/blackbox_exporter-0.18.0.linux-amd64/blackbox.yml --------------------------------------------------------------------------- modules: http_2xx: prober: http http_post_2xx: prober: http http: method: POST tcp_connect: prober: tcp pop3s_banner: prober: tcp tcp: query_response: - expect: "^+OK" tls: true tls_config: insecure_skip_verify: false ssh_banner: prober: tcp tcp: query_response: - expect: "^SSH-2.0-" irc_banner: prober: tcp tcp: query_response: - send: "NICK prober" - send: "USER prober prober prober :prober" - expect: "PING :([^ ]+)" send: "PONG ${1}" - expect: "^:[^ ]+ 001" icmp: prober: icmp --------------------------------------------------------------------------------
nohup /usr/local/blackbox_exporter-0.18.0.linux-amd64/blackbox_exporter --config.file=/usr/local/blackbox_exporter-0.18.0.linux-amd64/blackbox.yml &
vi /usr/local/Prometheus/prometheus.yml --------------------------------------------------------------------------- - job_name: 'blackbox' metrics_path: /probe params: module: [http_2xx] # 模块对应 blackbox.yml static_configs: - targets: - http://baidu.com # http - https://baidu.com # https - http://182.92.219.202:8761/actuator/ # 8080端口的域名 - http://182.92.219.202:8543 # 8543端口无服务,测试不通的情况 relabel_configs: - source_labels: [__address__] target_label: __param_target - source_labels: [__param_target] target_label: instance - target_label: __address__ replacement: xxxx:9115 # blackbox安装在哪台机器,端口默认9115 -----------------------------------------------------------------------
vi /usr/local/Prometheus/prometheus.yml ------------------------------------------------------------------------- - job_name: blackbox_tcp metrics_path: /probe params: module: [tcp_connect] static_configs: - targets: - 192.168.1.2:280 - 192.168.1.2:7013 relabel_configs: - source_labels: [__address__] target_label: __param_target - source_labels: [__param_target] target_label: instance - target_label: __address__ replacement: 192.168.1.99:9115 # Blackbox exporter. -------------------------------------------------------------------------
第一种方式:热加载
curl -X POST http://xxxx:9090/-/reload
第二种方式: 先 ps -ef|grep prometheus, kill掉prometheus进程然后执行下面命令启动服务
#重启prometheus
nohup /usr/local/Prometheus/prometheus --config.file=/usr/local/Prometheus/prometheus.yml &
[root@iZ2zejaz33icbod2k4cvy6Z rule]# vi /usr/local/Prometheus/conf/rule/http.yml
groups:
- name: http
rules:
- alert: xxx域名解析失败
expr: probe_success == 0
for: 5m
labels:
severity: "error"
annotations:
summary: "xxx域名解析失败"
tar -xvf node_exporter-1.0.1.linux-amd64.tar.gz -C /usr/local/
nohup /usr/local/node_exporter-1.0.1.linux-amd64/node_exporter &
//去consul服务器上执行命令:
curl -X PUT -d '{"id": "182.92.219.202_linux","name": "182.92.219.202_linux","address": "182.92.219.202","port": 9100,"tags": ["service"],"checks": [{"http": "http://182.92.219.202:9100/","interval": "5s"}]}' http://182.92.219.202:8502/v1/agent/service/register
浏览器ip:3000 访问grafana
manager ==> import ==> load 导入监控模板
普罗米修斯主页查看监控图标
tar -xvf redis_exporter-v1.16.0.linux-amd64.tar.gz -C /usr/local/
#指定redis服务地址和访问密码
nohup /usr/local/redis_exporter-v1.16.0.linux-amd64/redis_exporter -redis.addr 182.92.219.202:6379 -redis.password 123456 &
//去consul服务器上执行命令:
curl -X PUT -d '{"id": "182.92.219.202_mysql","name": "182.92.219.202_mysql","address": "182.92.219.202","port": 9104,"tags": ["service"],"checks": [{"http": "http://182.92.219.202:9104/","interval": "5s"}]}' http://182.92.219.202:8502/v1/agent/service/register
浏览器ip:3000 访问grafana
manager ==> import ==> load 导入监控模板
2)查看监控数据
注意:Memory Usage这个图表,一直是N/A。是因为redis_memory_max_bytes 获取的值为0,导致 redis_memory_used_bytes / redis_memory_max_bytes 结果不正常。
解决办法:将redis_memory_max_bytes 改为服务器的真实内存大小。更改计算公式:
redis_memory_used_bytes{instance=~"$instance"} / 8193428
<dependency>
<groupId>io.micrometer</groupId>
<artifactId>micrometer-registry-prometheus</artifactId>
<version>1.6.3</version>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-actuator</artifactId>
<version>2.4.2</version>
</dependency>
@Bean
MeterRegistryCustomizer<MeterRegistry> configurer(
@Value("${spring.application.name}") String applicationName) {
return (registry) -> registry.config().commonTags("application", applicationName);
}
management:
endpoints:
web: #开启web监控端点
exposure:
include: 'prometheus'
path-mapping:
prometheus: metrics #端点映射路径
base-path: / # 初始路径 (规范访问路径 方便加入consul动态配置)
enabled-by-default: true #自定义端点的启用和关闭
metrics:
tags:
application: ${spring.application.name} #对外暴露tag
health:
redis:
enabled: false #如果项目未使用redis 配置关闭redis链接监控,否则启动会报错
//执行命令,参数根据实际情况进行替换.例:application.name:proDemo
curl -X PUT -d '{"id": "182.92.219.202_{application.name}","name": "182.92.219.202_{application.name}","address": "182.92.219.202","port": 30240,"tags": ["service"],"checks": [{"http": "http://182.92.219.202:30240/metrics","interval": "5s"}]}' http://182.92.219.202:8502/v1/agent/service/register
浏览器ip:3000 访问grafana。 manager ==> import ==> load 导入监控模板
查看监控图形数据
elasticsearch_exporter主机采集组件下载地址:https://github.com/justwatchcom/elasticsearch_exporter/releases/download/v1.1.0/elasticsearch_exporter-1.1.0.linux-amd64.tar.gz
tar -zxvf elasticsearch_exporter-1.1.0.linux-amd64.tar.gz
mv /opt/resources/elasticsearch_exporter-1.1.0.linux-amd64.tar.gz /usr/local/elasticsearch_exporter
nohup ./elasticsearch_exporter --es.uri http://182.92.219.202:9200 &
//去consul服务器上执行命令:
curl -X PUT -d '{"id": "182.92.219.202_elasticsearch","name": "182.92.219.202_elasticsearch","address": "182.92.219.202","port": 9114,"tags": ["service"],"checks": [{"http": "http://182.92.219.202:9114/","interval": "5s"}]}' http://182.92.219.202:8502/v1/agent/service/register
浏览器ip:3000 访问grafana。 manager ==> import ==> load 导入监控模板
查看监控图形数据
yum install git
yum -y install make zlib zlib-devel gcc-c++ libtool openssl openssl-devel
yum -y install epel-release geoip-devel
ldd /usr/local/nginx/sbin/nginx |grep libGeoIP
cd /usr/local
git clone git://github.com/vozlt/nginx-module-vts.git
//nginx安装包目录根据实际情况做调整
/usr/local/nginx-1.16.1/configure --add-module=/usr/local/nginx-module-vts --with-http_realip_module --with-http_geoip_module
make
# 已安装nginx
#make install
//备份nginx 用来报错回滚
cp /usr/local/nginx/sbin/nginx /usr/local/nginx/sbin/nginx.back
//杀死nginx进程,否则无法替换nginx文件
ps -ef|grep nginx
kill -9 pid
//替换文件
cp /usr/local/nginx-1.16.1/objs/nginx /usr/local/nginx/sbin/nginx
vi /usr/local/nginx/conf/nginx.conf
---------------------------------------------------------------------------------------------------------------------------
#http下新增配置:
vhost_traffic_status_zone;
vhost_traffic_status_filter_by_host on;
# 80端口下 新增status接口监控
location /status {
vhost_traffic_status_display;
vhost_traffic_status_display_format html;
}
---------------------------------------------------------------------------------------------------------------------------
vim /usr/local/nginx/conf/nginx.conf
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
http {
...
# 增加geoip配置
geoip_country /usr/share/GeoIP/GeoIP.dat;
...
}
------------------------------------------------------------------------------------------------------------------------------------------------
$ vim /usr/local/nginx/conf/nginx.conf ------------------------------------------------------------------------- http { ... vhost_traffic_status_zone; vhost_traffic_status_filter_by_host on; geoip_country /usr/share/GeoIP/GeoIP.dat; vhost_traffic_status_filter_by_set_key $uri uri::$server_name; #每个uri访问量 vhost_traffic_status_filter_by_set_key $geoip_country_code country::$server_name; #不同国家/区域请求量 vhost_traffic_status_filter_by_set_key $status $server_name; #http code统计 vhost_traffic_status_filter_by_set_key $upstream_addr upstream::backend; #后端转发统计 vhost_traffic_status_filter_by_set_key $remote_port client::ports::$server_name; #请求端口统计 vhost_traffic_status_filter_by_set_key $remote_addr client::addr::$server_name; #请求IP统计 server { listen 80; server_name localhost; location /status { vhost_traffic_status_display; vhost_traffic_status_display_format html; } location ~ ^/storage/(.+)/.*$ { set $volume $1; vhost_traffic_status_filter_by_set_key $volume storage::$server_name; #请求路径统计 } } }
nginx-vts-exporter主机采集组件下载地址:https://github.com/hnlq715/nginx-vts-exporter/releases/download/v0.9.1/nginx-vts-exporter-0.9.1.linux-amd64.tar.gz
tar -zxvf nginx-vts-exporter-0.9.1.linux-amd64.tar.gz
mv /opt/resources/nginx-vts-exporter-0.9.1.linux-amd64 /usr/local/nginx-vts-exporter
./nginx-vts-exporter -nginx.scrape_timeout 10 -nginx.scrape_uri http://182.92.219.202/status/format/json
//执行命令:
curl -X PUT -d '{"id": "182.92.219.202_nginx","name": "182.92.219.202_nginx","address": "182.92.219.202","port": 9913,"tags": ["service"],"checks": [{"http": "http://182.92.219.202:9913/","interval": "5s"}]}' http://192.168.100.175:8502/v1/agent/service/register
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。