10k,当前值:{{ $value }}"description: "进程{{ $labels.groupname }}有{{ $value }}个僵尸进程"description: "进程{{ $labels.groupname}}在{{ $value }}秒前重启过"_prometheus监">
赞
踩
环境:
192.168.1.144 Ubuntu系统已经部署好Prometheus监控部署教程请看本人前面的教程
192.168.1.140 centos7系统已安装docker+docker-compose
mkdir /data/mangodb/ #创建mangodb的目录
cd /data/mangodb/ #进入目录下
vim docker-compose.yaml
version: '3'
services:
mango:
image: mongo
container_name: mongo
command: [--auth]
restart: always
volumes:
- /data/mongo/db:/data/db
ports:
- 27017:27017
environment:
MONGO_INITDB_ROOT_USERNAME: root
MONGO_INITDB_ROOT_PASSWORD: 123456
启动:
docker-compose up -d
查看:
docker ps
docker exec -it mongo mongosh admin #登陆mongodb
db.auth("root","123456") #用户登陆
> db.createUser({ #创建监控用户
> user: "exporter",
pwd: "password",
roles: [
{ role: "readAnyDatabase", db: "admin" },
{ role: "clusterMonitor", db: "admin" }
]
})
> db.auth("exporter","password") #返回1表示成功
> exit #退出mongodb
mkdir /data/mongo/mongodb_exporter #创建mongodb_exporter的安装目录
cd /data/mongo/mongodb_exporter #进入到安装目录下
vim docker-compose.yaml
version: '3.3'
services:
mongodb_exporter:
image: bitnami/mongodb-exporter
container_name: mongodb_exporter
restart: always
environment:
MONGODB_URI: "mongodb://exporter:password@192.168.1.140:27017/admin?ssl=false"
command:
- '--collect-all'
- '--compatible-mode'
ports:
- "9216:9216"
启动:
docker-compose up -d
在浏览器中输入mongodb-expoerter端的IP加9216端口进行访问
在Prometheus端打开prometheus.yml文件添加以下内容
vim prometheus/prometheus.yml
- job_name: 'mongodb_exporter'
static_configs:
- targets: ['192.168.1.140:9216']
labels:
instance: docker服务器
保存退出后重新加载配置
curl -X POST http://localhost:9090/-/reload
输入Prometheus端的IP加9090端口选择Status 然后选择 Targets 查看
因为之前我在Prometheus.yaml文件中加了一个触发器文件
所以在rules文件中添加一个mongodb的触发器文件即可
在Prometheus端创建一个mongodb的yml文件添加输入以下触发器内容
vim prometheus/rules/mongodb.yml
groups:
- name: PerconaMongodbExporter
rules:
- alert: MongodbDown
expr: 'mongodb_up == 0'
for: 0m
labels:
severity: critical
annotations:
summary: "MongoDB Down 容器:{{ $labels.instance }}"
description: "MongoDB 容器 is down,当前值:{{ $value }}"
- alert: MongodbNumberCursorsOpen
expr: 'mongodb_ss_metrics_cursor_open{csr_type="total"} > 10 * 1000'
for: 2m
labels:
severity: warning
annotations:
summary: "MongoDB 数字游标打开告警 容器:{{ $labels.instance }}"
description: "MongoDB为客户端打开的游标过多 > 10k,当前值:{{ $value }}"
- alert: MongodbCursorsTimeouts
expr: 'increase(mongodb_ss_metrics_cursor_timedout[1m]) > 100'
for: 2m
labels:
severity: warning
annotations:
summary: "MongoDB 游标超时 容器:{{ $labels.instance }}"
description: "MongoDB 太多游标超时,当前值:{{ $value }}"
- alert: MongodbTooManyConnections
expr: 'avg by(instance) (rate(mongodb_ss_connections{conn_type="current"}[1m])) / avg by(instance) (sum (mongodb_ss_connections) by (instance)) * 100 > 80'
for: 2m
labels:
severity: warning
annotations:
summary: "MongoDB 太多链接 容器:{{ $labels.instance }}"
description: "MongoDB 连接数 > 80% ,当前值:{{ $value }}"
- alert: MongodbVirtualMemoryUsage
expr: '(sum(mongodb_ss_mem_virtual) BY (instance) / sum(mongodb_ss_mem_resident) BY (instance)) > 3'
for: 2m
labels:
severity: warning
annotations:
summary: "MongoDB虚拟内存使用警告 容器:{{ $labels.instance }}"
description: "MongoDB虚拟内存使用过高,当前值:{{ $value }}"
检查配置:
docker exec -it prometheus promtool check config /etc/prometheus/prometheus.yml
重新加载配置:
curl -X POST http://localhost:9090/-/reload
在浏览器输入Prometheus端的IP加9090端口选择Alerts查看
找到dashboard复制id
在grafana端开始配置
完成
环境:
192.168.1.144 Ubuntu系统已经部署好Prometheus监控部署教程请看本人前面的教程
192.168.1.140 centos7系统已安装docker+docker-compose
创建cadvisor的目录
mkdir /data/cadvisor
cd /data/cadvisor
vim docker-compose.yaml
version: '3.3'
vadvisor:
image: google/cadvisor
container_name: cadvisor
restart: always
volumes:
- /:/rootfs:ro
- /var/run:/var/run:rw
- /sys:/sys:ro
- /var/lib/docker/:/var/lib/docker:ro
ports:
- 8080:8080
启动:
docker-compose up -d
访问CAdvisor端的IP加8080端口
vim prometheus/prometheus.yml
因为我之前配置过Prometheus机器监控docker所以添加以下内容
- targets: ['cadvisor:8080']
labels:
instance: Prometheus服务器
修改好配置后重新加载配置
curl -X POST http://localhost:9090/-/reload
我们Prometheus.yaml文件的触发器中有添加了一个触发器rules文件所以在指定rules文件内创建docker的触发器文件就行
我的Prometheus目录在/data/docker-prometheus/prometheus
进入到触发器目录添加以下内容
cd /data/docker-prometheus/prometheus/rules/
vim docker.yml
groups:
- name: DockerContainers
rules:
- alert: Containerkilled
expr: time() - container_last_seen > 60
for: 0m
labels:
severity: warning
annotations:
summary: "Docker容器被杀死,容器:{{ $labels.instance }}"
description: "{{ $value }} 个容器消失了"
- alert: ContainerAbsent
expr: absent(container_last_seen)
for: 5m
labels:
severity: warning
annotations:
summary: "Docker无容器,容器:{{ $labels.instance }}"
description: "5分钟检查容器不存在,当前值为{{ $value }}"
- alert: ContainerCpuUsage
expr: (sum(rate(container_cpu_usage_seconds_total{name!=""}[3m])) BY (instance, name) * 100) > 300
for: 2m
labels:
severity: warning
annotations:
summary: "Docker容器cpu使用率告警,容器:{{ $labels.instance }}"
description: "容器cpu使用率超过300%,当前值为:{{ $value }}"
- alert: ContainerMemoryUsage
expr: (sum(container_memory_working_set_bytes{name!=""}) BY (instance, name) / sum(container_spec_memory_limit_bytes > 0) BY (instance, name) * 100) > 80
for: 2m
labels:
severity: warning
annotations:
summary: "Docker容器内存使用率告警,容器:{{ $labels.instance }}"
description: "容器内存使用率超过80%,当前值为:{{ $value }}"
- alert: ContainerVolumeIoUsage
expr: (sum(container_fs_io_current{name!=""}) BY (instance, name) * 100) > 80
for: 2m
labels:
severity: warning
annotations:
summary: "容器存储io使用率告警,容器:{{ $labels.instance }}"
description: "容器存储io使用率超过 80%,当前值为:{{ $value }}"
- alert: ContainerHighThrottleRate
expr: rate(container_cpu_cfs_throttled_seconds_total[3m]) > 1
for: 2m
labels:
severity: warning
annotations:
summary: "容器限制告警,容器:{{ $labels.instance }}"
description: "容器被限制,当前值为:{{ $value }}"
检查配置:
docker exec -it prometheus promtool check config /etc/prometheus/prometheus.yml
重新加载配置:
curl -X POST http://localhost:9090/-/reload
访问Prometheus端的IP加9090端口选择Alerts查看
官网:https://grafana.com/grafana/dashboards
完成
注:已安装docker 和 docker-compose
mkdir /data/mysql #创建存放MySQL的目录
cd /data/mysql
#创建docker-compose.yaml文件
vim docker-compose.yaml
version: '3.1'
services:
db:
image: mysql
restart: always
container_name: mysql
environment:
TZ: Asia/Shanghai
LANG: en_US.UTF-8
MYSQL_ROOT_PASSWORD: sy123456
command: [
'--default-authentication-plugin=mysql_native_password',
'--character-set-server=utf8mb4',
'--collation-server=utf8mb4_general_ci',
'--lower_case_table_names=1',
'--performance_schema=1'
]
volumes:
- /data/mysql/data/:/var/lib/mysql
ports:
- 3306:3306
docker-compose up -d
docker-compose exec db mysql --version
因为我们是用docker-compose安装的MySQL所以用以下命令创建用户
登陆MySQL
docker exec -it mysql mysql -uroot -psy123456
创建监控用户:
CREATE USER 'exporter'@'%' IDENTIFIED BY 'password' WITH MAX_USER_CONNECTIONS 3;
解释:
数据库中创建一个名为 'exporter' 的用户,并指定了该用户的连接限制为最多 3 个连接。该用户将使用 'password' 作为登录密码
给与监控用户权限:
GRANT PROCESS, REPLICATION CLIENT, SELECT ON *.* TO 'exporter'@'%';
解释:
命令是为用户 'exporter' 授予在所有数据库中执行 PROCESS、REPLICATION CLIENT 和 SELECT 权限的权限,允许该用户从任意主机连接到 MySQL 服务器
退出MySQL:
quit
docker exec -it mysql mysql -uexporter -ppassword
创建mysql_exporter的安装目录
mkdir /data/mysqld_exporter -p
cd /data/mysqld_exporter -p
创建docker-compose.yaml文件
vim docker-compose.yaml
version: '3.3'
services:
mysqld-exporter:
image: prom/mysqld-exporter
container_name: mysqld-exporter
restart: always
command:
- '--collect.info_schema.processlist'
- '--collect.info_schema.innodb_metrics'
- '--collect.info_schema.tablestats'
- '--collect.info_schema.tables'
- '--collect.info_schema.userstats'
- '--collect.engine_innodb_status'
environment:
- DATA_SOURCE_NAME=exporter:password@(192.168.1.140:3306)/
ports:
- 9104:9104
docker-compose up -d
日志如图
在当前目录下创建一个my.cnf文件
vim my.cnf
[client]
user=exporter
password=password
host=192.168.1.140
port=3306
修改docker-compose.yaml 文件
vim docker-compose.yaml
version: '3.3'
services:
mysqld-exporter:
image: prom/mysqld-exporter
container_name: mysqld-exporter
restart: always
command:
- '--collect.info_schema.processlist'
- '--collect.info_schema.innodb_metrics'
- '--collect.info_schema.tablestats'
- '--collect.info_schema.tables'
- '--collect.info_schema.userstats'
- '--collect.engine_innodb_status'
volumes:
- ./my.cnf:/.my.cnf #注意映射进容器的根号下不然会报错
ports:
- 9104:9104
重新启动
docker-compose restart
在浏览器输入本机IP加9104端口访问查看
在Prometheus端打开prometheus.yml文件在尾部添加以下内容
vim prometheus/prometheus.yml
- job_name: 'mysql_exporter'
static_configs:
- targets: ['192.168.1.140:9104']
labels:
instance: docker服务器
curl -X POST http://localhost:9090/-/reload
访问Prometheus端的IP加9090端口选择Status在选择Targets查看
我之前在Prometheus.yaml文件的触发器中有添加了一个触发器rules文件所以在指定rules文件内创建docker的触发器文件就行
我的Prometheus目录在/data/docker-prometheus/prometheus
进入到rules目录下创建mysqld的触发器文件
vim mysqld.yml
groups:
- name: MySQL
rules:
- alert: MysqlDown
expr: mysql_up == 0
for: 30s
labels:
severity: critical
annotations:
summary: "MySQL Down,实例:{{ $labels.instance }}"
description: "MySQL_exporter连不上MySQL了 当前状态为:{{ $value }}"
- alert: MysqlTooManyConnections
expr: max_over_time(mysql_global_status_threads_connected[1m]) / mysql_global_variables_max_connections * 100 > 80
for: 2m
labels:
severity: warning
annotations:
summary: "MySQL连接数过多告警,实例:{{ $labels.instance }}"
description: "MySQL连接数>80% , 当前值:{{ $value }}"
- alert: MysqlHighThreadsRunning
expr: max_over_time(mysql_global_status_threads_running[1m]) > 20
for: 2m
labels:
severity: warning
annotations:
summary: "MySQL运行的线程过多,实例:{{ $labels.instance }}"
description: "MySQL运行的线程>20 , 当前运行的线程:{{ $value }}"
- alert: MysqlSlowQueries
expr: increase(mysql_global_status_slow_queries[2m]) > 0
for: 2m
labels:
severity: warning
annotations:
summary: "MySQL慢日志告警,实例:{{ $labels.instance }}"
description: "MySQL在过去的两分钟有新的 , {{ $value }}条慢查询"
- alert: MysqlInnodbLogWaits
expr: rate(mysql_global_status_innodb_log_waits[15m]) > 10
for: 0m
labels:
severity: warning
annotations:
summary: "MySQL innodb日志等待,实例:{{ $labels.instance }}"
description: "MySQL innodb日志写入停滞, 当前值:{{ $value }}"
- alert: MysqlRestarted
expr: mysql_global_status_uptime < 60
for: 0m
labels:
severity: info
annotations:
summary: "MySQL 重启,实例:{{ $labels.instance }}"
description: "不到一分钟MySQL 重启过"
docker exec -it prometheus promtool check config /etc/prometheus/prometheus.yml
更新配置
curl -X POST http://localhost:9090/-/reload
访问Prometheus端的IP加9090端口选择Alerts查看
https://grafana.com/grafana/dashboards
进入grafana页面 在左上角选择 Starred 然后在右上角选择New 然后选择 import 会出现下面的页面
完成
我的服务器已经安装docker和docker-compose
创建process的存储目录:
mkdir /data/process_exporter -p
cd /data/process_exporter
创建配置文件:
vim process.yml
process_names:
- name: "{{.Comm}}"
cmdline:
- '.+'
也可以监控指定进程和上面的二选一即可
vim process.yml
process_names:
- name: "{{.Matches}}" #匹配模板
cmdline:
- 'nginx' #唯一标识
- name: "{{.Matches}}"
cmdline:
- 'mongod'
- name: "{{.Matches}}"
cmdline:
- 'mysqld'
- name: "{{.Matches}}"
cmdline:
- 'redis-server'
docker run -d --rm -p 9256:9256 --privileged -v /proc:/host/proc -v `pwd`:/config --name process-exporter ncabatoff/process-exporter --procfs /host/proc -config.path /config/process.yml
解释:
3、在浏览器访问查看
输入本机IP加9256端口查看是否能检测到数据
在Prometheus端打开prometheus.yml文件在尾部添加以下内容
vim prometheus/prometheus.yml
- job_name: 'process_exporter'
scrape_interval: 30s
scrape_timeout: 15s
static_configs:
- targets: ['192.168.1.140:9256'
curl -X POST http://localhost:9090/-/reload 更新配置
访问Prometheus端的IP加9090端口选择Status在选择Targets查看
我之前在Prometheus.yaml文件的触发器中有添加了一个触发器rules文件所以在指定rules文件内创建docker的触发器文件就行
我的Prometheus目录在/data/docker-prometheus/prometheus
进入到rules目录下创建process的触发器文件
vim process.yml
groups:
- name: process
rules:
- alert: 进程数多告警
expr: sum(namedprocess_namegroup_states) by (instance) >1000
for: 1m
labels:
severity: warning
annotations:
description: "服务器当前有{{ $value }}个进程"
- alert: 僵尸进程数告警
expr: sum by(instance, groupname) (namedprocess_namegroup_states{state="Zombie"}) > 0
for: 1m
labels:
severity: warning
annotations:
description: "进程{{ $labels.groupname }}有{{ $value }}个僵尸进程"
- alert: 进程重启告警
expr: ceil(time() -max by (instance, groupname) (namedprocess_namegroup_oldest_start_time_seconds)) < 60
for: 15s
labels:
severity: warning
annotations:
description: "进程{{ $labels.groupname}}在{{ $value }}秒前重启过"
- alert: 进程退出告警
expr: max by(instance, groupname) (delta(namedprocess_namegroup_oldest_start_time_seconds{groupname=~"^java.*|^nginx.*"}[1d])) < 0
for: 1m
labels:
severity: warning
annotations:
description: "进程{{ $labels.groupname}}退出了"
检查配置
docker exec -it prometheus promtool check config /etc/prometheus/prometheus.yml
更新配置
curl -X POST http://localhost:9090/-/reload
访问Prometheus端的IP加9090端口选择Alerts查看
https://grafana.com/grafana/dashboards
进入grafana页面 在左上角选择 Starred 然后在右上角选择New 然后选择 import 会出现下面的页面
完成
docker run -d --restart=always --name domain_exporter -p 9222:9222 caarlos0/domain_exporter
在Prometheus端打开配置文件添加以下内容
vim prometheus/prometheus.yml
- job_name: domain
#scrape_interval: 1h
scrape_interval: 15s
metrics_path: /probe
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- target_label: __address__
replacement: 192.168.1.140:9222
static_configs:
- targets:
- qq.com
- baidu.cn
curl -X POST http://localhost:9090/-/reload #更新配置
访问Prometheus端的IP加9090端口选择Status在选择Targets查看
我之前在Prometheus.yaml文件的触发器中有添加了一个触发器rules文件所以在指定rules文件内创建docker的触发器文件就行
我的Prometheus目录在/data/docker-prometheus/prometheus
进入到rules目录下创建process的触发器文件
vim domain.yml
groups:
- name: domain
rules:
- alert: 域名检测失败
expr: domain_probe_success == 0
for: 2h
labels:
severity: warning
annotations:
summary: '{{ $labels.instance }}'
description: '{{ $labels.domain }}域名检测失败'
- alert: 域名过期
expr: domain_expiry_days < 30
for: 2h
labels:
severity: warning
annotations:
summary: '{{ $labels.instance }}'
description: '{{ $labels.domain }}将在30天后过期'
- alert: 域名过期
expr: domain_expiry_days < 5
for: 2h
labels:
severity: page
annotations:
summary: '{{ $labels.instance }}'
description: '{{ $labels.domain }}将在5天后过期'
检查配置
docker exec -it prometheus promtool check config /etc/prometheus/prometheus.yml
更新配置
curl -X POST http://localhost:9090/-/reload
访问Prometheus端的IP加9090端口选择Alerts查看
https://grafana.com/grafana/dashboards
进入grafana页面 在左上角选择 Starred 然后在右上角选择New 然后选择 import 会出现下面的页面
完成
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。