赞
踩
created by fangchangtan | 2020/2/24
elastalert组件作为elk中日志关键词的告警组件。基本的流程是,通过elk日志获取程序发出的不间断的心跳、错误日志关键词ERROR抓取等 ,获得对程序的健康状态和稳定性的监控告警。
## git拉去文件
git clone https://github.com/bitsensor/elastalert.git
##切换目录
cd elastalert
需要切换到elastalert目录下面,(官方建议的安装方式)
#启动elastalert容器
sudo docker run --rm -p 3030:3030 \
-v `pwd`/config/elastalert.yaml:/opt/elastalert/config.yaml \
-v `pwd`/config/elastalert-test.yaml:/opt/elastalert/config-test.yaml \
-v `pwd`/config/config.json:/opt/elastalert-server/config/config.json \
-v `pwd`/rules:/opt/elastalert/rules \
-v `pwd`/rule_templates:/opt/elastalert/rule_templates \
--net="host" \
--name elastalert-fct2 bitsensor/elastalert:2.0.0
或者,正式的安装方式(建议方式):
#正式环境,启动elastalert
docker run --rm \
--name fct-elastalert \
--net "host" \
-p 3030:3030 \
-v /data/poc/trial-production/myelastalert/elastalert/config/elastalert.yaml:/opt/elastalert/config.yaml \
-v /data/poc/trial-production/myelastalert/elastalert/config/config.json:/opt/elastalert-server/config/config.json \
-v /data/poc/trial-production/myelastalert/elastalert/rules:/opt/elastalert/rules \
-v /data/poc/trial-production/myelastalert/elastalert/rule_templates:/opt/elastalert/rule_templates \
-v /data/poc/trial-production/myelastalert/elastalert/config/smtp_auth.yaml:/opt/elastalert/config/smtp_auth.yaml \
-v /data/poc/trial-production/myelastalert/elastalert/server_data:/opt/elastalert/server_data \
-v /data/poc/trial-production/myelastalert/elastalert/logs:/opt/logs \
bitsensor/elastalert:2.0.0
其中config.conf文件,主要配置需要连接的es地址,规则rule和rul_templates的路径,要写入的es的index的名称;
{ "appName": "elastalert-server", "port": 3030, "wsport": 3333, "elastalertPath": "/opt/elastalert", "verbose": false, "es_debug": false, "debug": false, "rulesPath": { "relative": true, "path": "/rules" }, "templatesPath": { "relative": true, "path": "/rule_templates" }, "es_host": "172.19.32.106", "es_port": 9202, "writeback_index": "elastalert_status" }
其中,elastalert.yaml的配置如下
# The elasticsearch hostname for metadata writeback # Note that every rule can have its own elasticsearch host es_host: 172.19.32.106 # The elasticsearch port es_port: 9202 # This is the folder that contains the rule yaml files # Any .yaml file will be loaded as a rule rules_folder: rules # How often ElastAlert will query elasticsearch # The unit can be anything from weeks to seconds run_every: seconds: 5 # ElastAlert will buffer results from the most recent # period of time, in case some log sources are not in real time buffer_time: minutes: 1 # Optional URL prefix for elasticsearch #es_url_prefix: elasticsearch # Connect with TLS to elasticsearch #use_ssl: True use_ssl: False # Verify TLS certificates #verify_certs: True verify_certs: False # GET request with body is the default option for Elasticsearch. # If it fails for some reason, you can pass 'GET', 'POST' or 'source'. # See http://elasticsearch-py.readthedocs.io/en/master/connection.html?highlight=send_get_body_as#transport # for details #es_send_get_body_as: GET # Option basic-auth username and password for elasticsearch #es_username: someusername #es_password: somepassword # The index on es_host which is used for metadata storage # This can be a unmapped index, but it is recommended that you run # elastalert-create-index to set a mapping writeback_index: elastalert_status # If an alert fails for some reason, ElastAlert will retry # sending the alert until this time period has elapsed alert_time_limit: days: 2
其次还有一个elastalert-test.yaml文件,该配置只是用来当你使用API来测试规则的时候,这个配置文件可以使你在为不同的示例测试不同的规则时候,可以写不同的写回索引;
elastalert.yaml文件中的smtp_auth.yaml文件配置,
user: swtx_wuhan@163.com
password: sdwtyx234
然后,配置elastalert中的告警规则, 扫描es制定索引中的最近1min中,满足查询过滤条件日志的消息数量》5时候,直接发送邮件到fangchangtan@swtx.com报警;
如下,是/rules/tank-rules.yaml的elastalert的配置规则文件。
es_host: 172.19.32.106 es_port: 9202 #rule name 必须是独一的,不然会报错,这个定义完成之后,会成为报警邮件的标题 ## (Required) ## Rule name, must be unique name: fct-test-rule-name #配置一种数据验证的方式,有 any,blacklist,whitelist,change,frequency,spike,flatline,new_term,cardinality #any:只要有匹配就报警; #blacklist:compare_key字段的内容匹配上 blacklist数组里任意内容; #whitelist:compare_key字段的内容一个都没能匹配上whitelist数组里内容; #change:在相同query_key条件下,compare_key字段的内容,在 timeframe范围内 发送变化; #frequency:在相同 query_key条件下,timeframe 范围内有num_events个被过滤出 来的异常; #spike:在相同query_key条件下,前后两个timeframe范围内数据量相差比例超过spike_height。其中可以通过spike_type设置具体涨跌方向是- up,down,both 。还可以通过threshold_ref设置要求上一个周期数据量的下限,threshold_cur设置要求当前周期数据量的下限,如果数据量不到下限,也不触发; #flatline:timeframe 范围内,数据量小于threshold 阈值; #new_term:fields字段新出现之前terms_window_size(默认30天)范围内最多的terms_size (默认50)个结果以外的数据; #cardinality:在相同 query_key条件下,timeframe范围内cardinality_field的值超过 max_cardinality 或者低于min_cardinality ## (Required) ## Type of alert. ## the frequency rule type alerts when num_events events occur with timeframe time ##我配置的是frequency,这个需要两个条件满足,在相同 query_key条件下,timeframe 范围内有num_events个被过滤出来的异常 type: frequency #这个index 是指再kibana 里边的index,支持正则匹配,支持多个index,同时如果嫌麻烦直接* 也可以。 ## (Required) ## Index to search, wildcard supported index: fct-logstash* # 只要1最近1min内,有一条事件满足条件,就满足规则,出发报警 num_events: 1 timeframe: minutes: 1 #这个还是非常关键的地方,就是你希望程序的message里边出现了什么样的关键字就报警,这个其实就是elasticsearch 的query语句,支持 AND&OR等。 filter: - query: query_string: query: "UNKNOWN" #在邮件正文会显示你定义的alert_text alert_text: "你好,请回复邮件,方昌坦" # Setup report smtp config smtp_host: smtp.163.com smtp_port: 25 smtp_ssl: False #SMTP auth from_addr: swtx_wuhan@163.com email_reply_to: swtx_wuhan@163.com smtp_auth_file: /opt/elastalert/config/smtp_auth.yaml # (Required) # # The alert is use when a match is found alert: - "email" # (required, email specific) # # a list of email addresses to send alerts to email: - "swtx_wuhan@163.com"
注意: 此处需要注册163邮箱,并开通smtp协议:
邮箱账号:swtx_wuhan@163.com
邮箱密码:221123.com
smtp协议密码:swtx234
其中smtp协议可以允许第三方用户登录访问该邮箱。需要163邮箱开通smtp协议,在163邮箱设置中设置;
最后重新启elastalert,是的刚才的新配置生效;
本地测试106主机上,运行elastalert的命令如下:
docker run --rm \
--name fct-elastalert \
--net "host" \
-p 3030:3030 \
-v /data/poc/trial-production/myelastalert/elastalert/config/elastalert.yaml:/opt/elastalert/config.yaml \
-v /data/poc/trial-production/myelastalert/elastalert/config/config.json:/opt/elastalert-server/config/config.json \
-v /data/poc/trial-production/myelastalert/elastalert/rules:/opt/elastalert/rules \
-v /data/poc/trial-production/myelastalert/elastalert/rule_templates:/opt/elastalert/rule_templates \
-v /data/poc/trial-production/myelastalert/elastalert/config/smtp_auth.yaml:/opt/elastalert/config/smtp_auth.yaml \
-v /data/poc/trial-production/myelastalert/elastalert/config/smtp_auth.yaml:/opt/elastalert/config/smtp_auth.yaml \
-v /data/poc/trial-production/myelastalert/elastalert/server_data:/opt/elastalert/server_data \
-v /data/poc/trial-production/myelastalert/elastalert/logs:/opt/logs \
bitsensor/elastalert:2.0.0
为了验证elastalert的告警效果,需要启动logstash向es中发送测试数据;
在172.19.32.67上,本地启动logstash验证:
用来接收kafka中的日志数据,并通过logstash过滤之后放松到elasticsearch中的fct-logstash_*索引中;
docker run \
--rm \
--name fct-alert-logstash \
-p 5047:5044 \
-v /root/fct/logstash-test/logstash_kafka.conf:/logstash/logstash_kafka.conf \
-v /root/fct/logstash-test/logstash.yml:/usr/share/logstash/config/logstash.yml \
registry.marathon.l4lb.thisdcos.directory:5000/logstash:6.6.1 \
logstash -f /logstash/logstash_kafka.conf
出现如上所示,表明发送邮件成功!
启动额elastalert服务的日志中,可以看到如下错误。
运行过程提示:(提示邮箱配置不正确),需要配置正确的邮箱连接
15:43:43.085Z INFO elastalert-server: Router: Listening for GET request on /mapping/:index. 15:43:43.085Z INFO elastalert-server: Router: Listening for POST request on /search/:index. 15:43:43.090Z INFO elastalert-server: ProcessController: Starting ElastAlert 15:43:43.090Z INFO elastalert-server: ProcessController: Creating index 15:43:43.980Z INFO elastalert-server: ProcessController: Elastic Version:6 Mapping used for string:{'type': 'keyword'} Index elastalert_status already exists. Skipping index creation. 15:43:43.980Z INFO elastalert-server: ProcessController: Index create exited with code 0 15:43:43.981Z INFO elastalert-server: ProcessController: Starting elastalert with arguments [none] 15:43:43.991Z INFO elastalert-server: ProcessController: Started Elastalert (PID: 50) 15:43:43.992Z INFO elastalert-server: Server: Server listening on port 3030 15:43:43.993Z INFO elastalert-server: Server: Websocket listening on port 3333 15:43:43.994Z INFO elastalert-server: Server: Server started 15:44:04.860Z ERROR elastalert-server: ProcessController: ERROR:root:Error while running alert email: Error connecting to SMTP host: Connection unexpectedly closed 15:48:06.886Z ERROR elastalert-server: ProcessController: WARNING:elasticsearch:GET http://172.19.32.106:9202/elastalert_status/elastalert/_search?size=10000 [status:400 request:0.012s] 15:48:06.886Z ERROR elastalert-server: ProcessController: ERROR:root:Error fetching aggregated matches: RequestError(400, u'search_phase_execution_exception', u'parse_exception: Encountered " "-" "- "" at line 1, column 13.\nWas expecting one of:\n <BAREOPER> ...\n "(" ...\n "*" ...\n <QUOTED> ...\n <TERM> ...\n <PREFIXTERM> ...\n <WILDTERM> ...\n <REGEXPTERM> ...\n "[" ...\n "{" ...\n <NUMBER> ...\n ') 15:48:26.972Z ERROR elastalert-server: ProcessController: ERROR:root:Error while running alert email: Error connecting to SMTP host: Connection unexpectedly closed
出现该错误,表示邮箱没有连接上去;请检查配置文件是否正确;
SMTPDataError: (554, 'DT:SPM 163 smtp11,D8CowADn5mq2dFNewkQ5Aw--.52552S3 1582527670,please see http://mail.163.com/help/help_spam_16.htm?ip=58.49.28.162&hostid=smtp11&time=1582527670')
07:01:11.026Z ERROR elastalert-server:
ProcessController: ERROR:root:Uncaught exception running rule fct-Example-rule-name: (554, 'DT:SPM 163 smtp11,D8CowADn5mq2dFNewkQ5Aw--.52552S3 1582527670,please see http://mail.163.com/help/help_spam_16.htm?ip=58.49.28.162&hostid=smtp11&time=1582527670')
其中, •554 DT:SPM 发送的邮件内容包含了未被许可的信息,或被系统识别为垃圾邮件。请检查是否有用户发送病毒或者垃圾邮件;
表明,告警程序将使用网易163邮箱发送告警程序到swtx_wuhan@163.com和fangchangtan@swtx.com两个邮箱组成的邮箱用户组。
解决方法:
1.首先,需要在163邮箱中,网页版的首页中,”设置“-》”常规设置“-》”反垃圾/黑白名单 “-》右侧主页中有"白名单”(添加白名单选项卡),将白名单“swtx_wuhan@163.com”邮箱地址,添加进入白名单;
提示:目前只是简单的走通所有的elk的告警流程,对于elastalert的各种告警规则,并没有深究,尤其是各种告警场景的罗列,下一步需要继续深入研究。
附注:
关于elasticalert的过滤规则,如下
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。