FELK学习(elastalert自定义邮件模板)

发表于 2020-05-28 分类于 FELK 阅读次数：本文字数： 8.2k 阅读时长 ≈ 7 分钟

elastalert是专门为elastsearch开源的日志关键字监控工具, 支持非常多的告警方式且自定义程度很高, 代码写的也非常清晰, 最近调研了这个开源库进行了二次开发, 用在了业务日志关键字监控上,还是很不错的.这里站在用户使用的角度对邮件告警进行了改造, 期间踩了不少坑.

下次会记录一下常见的监控rule类型及接入微信告警的方式.

安装

1	pip3 install elastalert

启动

物理机

1	/usr/bin/python -m elastalert.elastalert --rule /etc/elastalert/rules/my_rule.yaml --verbose

容器

官方给出的docker启动方式, 但比较粗糙, 不建议, 建议直接运行在k8s中.

git clone https://github.com/bitsensor/elastalert.git; cd elastalert
docker run -d -p 3030:3030 \
    -v `pwd`/config/elastalert.yaml:/opt/elastalert/config.yaml \
    -v `pwd`/config/config.json:/opt/elastalert-server/config/config.json \
    -v `pwd`/rules:/opt/elastalert/rules \
    -v `pwd`/rule_templates:/opt/elastalert/rule_templates \
    --net="host" \
    --name elastalert bitsensor/elastalert:latest

k8s

这里推荐一个在elastalert的基础上集成restfulapi接口的库, elastalert-server

直接支持最新的python3.8+elastalert2.0.4, 可通过http 接口进行CURD

创建索引

elastalert在启动时会自动创建相关索引，当然也可事先使用elastalert-create-index提示创建索引

root@python-debug-5fdd9d9cd5-fkfkh:/usr/local/lib/python3.5# elastalert-create-index
Enter Elasticsearch host: 127.0.0.1
Enter Elasticsearch port: 9200
Use SSL? t/f: f
Enter optional basic-auth username (or leave blank): 
Enter optional basic-auth password (or leave blank): 
Enter optional Elasticsearch URL prefix (prepends a string to the URL of every request): 
New index name? (Default elastalert_status) realtysense.elastalert_status
New alias name? (Default elastalert_alerts) realtysense.elastalert_status
Name of existing index to copy? (Default None) 
Elastic Version: 5.4.1
Reading Elastic 5 index mappings:
Reading index mapping 'es_mappings/5/silence.json'
Reading index mapping 'es_mappings/5/elastalert_status.json'
Reading index mapping 'es_mappings/5/elastalert.json'
Reading index mapping 'es_mappings/5/past_elastalert.json'
Reading index mapping 'es_mappings/5/elastalert_error.json'
New index realtysense.elastalert_status created
Done!

上述的5个索引主要用于保存elastalert运行期间的相关数据, 会直接写回到es中, 主要用于计算比如聚合,发现日志关键字时的日志现场等,同时也会记录elastalert本身的数据.

配置热更新

elastalert代码中使用apscheduler库定时检查rule文件的hash值，在运行周期到达时会比对rules的hash值是否有变化从而修改apscheduler的modify_job来实现热更新

new_rule_hashes = self.rules_loader.get_hashes(self.conf, self.args.rule)
        # Check each current rule for changes
        for rule_file, hash_value in self.rule_hashes.items():
            if rule_file not in new_rule_hashes:
                # Rule file was deleted
                elastalert_logger.info('Rule file %s not found, stopping rule execution' % (rule_file))
                for rule in self.rules:
                    if rule['rule_file'] == rule_file:
                        break
                else:
                    continue
                self.scheduler.remove_job(job_id=rule['name'])
                self.rules.remove(rule)
                continue
            if hash_value != new_rule_hashes[rule_file]:
                # Rule file was changed, reload rule
                try:
                    new_rule = self.rules_loader.load_configuration(rule_file, self.conf)
                    if not new_rule:
                        logging.error('Invalid rule file skipped: %s' % rule_file)
                        continue
                    if 'is_enabled' in new_rule and not new_rule['is_enabled']:
                        elastalert_logger.info('Rule file %s is now disabled.' % (rule_file))
                        # Remove this rule if it's been disabled
                        self.rules = [rule for rule in self.rules if rule['rule_file'] != rule_file]
                        continue

因此如果是rule文件是使用的k8s的configmap进行部署, 则可直接修改configmap，不需要重启.

告警方式

elastalert支持非常多的报警方式, 而且官方也支持直接对报警方式进行二次开发，只要实现对应的类即可, 详情alerts, 这里以邮件为例来自定义邮件模板

自定义邮件格式

邮件subject

默认的邮件的subject的格式是ElastAlert: index realty contain is invalid

index realty contain is invalid为rule文件中定义的名字

alert_subject: "Issue {0} occurred at {1}"
alert_subject_args:  # 这两个参数会被替换到上面两个占位参数中.
- issue.name
- "@timestamp"

这里使用的python中的字符串占位替换.类似于"{}, {}".format(xxx, yyy)

邮件内容

官方支持的邮件内容是纯文本的，非常不美观，好在直接在rule文件中使用alert_text字段来指定使用的模板,可以使用html格式，同样，使用参数来进行占位替换.

但这存在一个问题是,每一个rule文件都需要写一大坨的html模板,虽然内容相差无几, 但还是需要根据需要展示的参数的个数来修改html模板, 因此，这里使用固定的html模板样式动态地对参数个数进行支持, 使用到jinja2来做, 最终实现的效果是: 使用者不需要关注模板内容, 直接指定alert_text_args的参数即可

alert:
- "email"
email_format: html
alert_text_type: alert_text_only # alert_text_only: 不发送默认模板内容
alert_subject: "IDS Event From {0} Priority: {1}"
alert_subject_args:
- hostname
- priority
#alert_text: | # 不需要指定模板，由jinja根据alert_text_args字段动态生成.
#	<html content>
alert_text_args:
- msg
- "@timestamp"
- "@version"
- _id
- _index
- _type
- service_name
- kubernetes.host # 如果是嵌套的json. 需要用.的方式引用
- num_hits
- num_matches
email:
- "[email protected]"

邮件效果如下:

问题

镜像编译时提示以下invalid syntax

原因: python3.7的版本需要jira2.0以上的版本, 默认requirements.txt中指定的jira版本为1.x的版本, 不相容.

解决: 升级jira的版本.

启动时提示 duplicate rule name

原因: 由于elastalert是递归的遍历rules规则，如果rules是使用的kubernetes的configmap挂载进来的，最终在rules目录下生成的文件会是下面的这样

在..2020_05_26_07_37_04.030814278目录下也会存在一个同名的规则文件, 因此会提示rule 重复.

解决: 在elastalert的配置文件中可指定scan_subdirectories: False即不对rules目录下的子目录进行扫描即可解决.

发送邮件时出现以下错误:

ERROR elastalert-server:
ProcessController:  ERROR:root:Traceback (most recent call last):
  File "/opt/elastalert/elastalert/elastalert.py", line 1450, in alert
    return self.send_alert(matches, rule, alert_time=alert_time, retried=retried)
  File "/opt/elastalert/elastalert/elastalert.py", line 1544, in send_alert
    alert.alert(matches)
  File "/opt/elastalert/elastalert/alerts.py", line 484, in alert
    body = self.create_alert_body(matches)
  File "/opt/elastalert/elastalert/alerts.py", line 291, in create_alert_body
    body += str(BasicMatchString(self.rule, match))
  File "/opt/elastalert/elastalert/alerts.py", line 171, in __str__
    self._add_custom_alert_text()
  File "/opt/elastalert/elastalert/alerts.py", line 99, in _add_custom_alert_text
    alert_text = alert_text.format(*alert_text_values)
KeyError: '\n            background-color'

原因: 由于使用了jinja2的语法, 在hmtl模板文件中会使用大括号来占位, 但是html的style有时也会使用大括号, 不知道为何jinja2错误地把style中的大括号当成是占位参数，从而出现以下错误, 类似下面这种

<style>
  table#t01 tr:nth-child(even) {
     background-color: #eee;
  }
</style>

我直接把以下内容删掉后就没问题了,这个有点坑，要注意.

1
2
3

{
     background-color: #eee;
 }

调试

如果在日志中出现query hits为0的多半原因是因为转换后的es 查询语句查询出来的结果就是为0

ProcessController:  INFO:elastalert:Queried rule index realty find invalid keyword from 2020-05-27 06:55 UTC to 2020-05-27 07:00 UTC: 0 / 0 hits
    
07:00:51.888Z ERROR elastalert-server:
    ProcessController:  INFO:elastalert:Ran index realty find invalid keyword from 2020-05-27 06:55 UTC to 2020-05-27 07:00 UTC: 0 query hits (0 already seen), 0 matches, 0 alerts sent

可打开es_debug_trace将es查询记录到指定的文件中, 如下:

curl -H 'Content-Type: application/json' -XGET 'http://localhost:9200/demo.demo-*/_search?pretty&_source_include=%2A%2C%40timestamp&ignore_unavailable=true&scroll=30s&size=10' -d '{
  "query": {
    "bool": {
      "filter": {
        "bool": {
          "must": [
            {
              "range": {
                "@timestamp": {
                  "gt": "2020-05-27T08:31:43.695209Z",
                  "lte": "2020-05-27T08:33:15.189058Z"
                }
              }
            },
            {
              "wildcard": { # 重点关注这里
                "msg": "*unaryInterceptor*"
              }
            }
          ]
        }
      }
    }
  },
  "sort": [
    {
      "@timestamp": {
        "order": "asc"
      }
    }
  ]
}'

在这里要注意的是es的语法中, 使用wildcard时只能使用小写的字符串, 这是一个坑, 原因可参考zsk-blog

使用关键字参数

除了使用位置参数外，也支持在模板中使用关键字参数, 如下:

type: any
filter:
    - term:
        _type: elastalert_error
index: elastalert_status
alert_subject: "Error on rule elastalert "
alert_text_kw:
     data: data
alert_text: |
    Error elastalert :
    - {data}

elastalert-server支持es_debug_trace

默认的elastalert-server对elastalert参够使用的参数不是很多, 目前只支持--verbose跟--debug这两个调试参数, 但是elastalert还有一个参数es_debug_trace在elastalert-server的配置文件中是不支持的，这对于需要debug es的查询语句时很不方便，需要修改源码支持

elastalert-server/src/controllers/process/index.js

1
2
3

if (config.get('es_debug_trace') !== undefined && config.get('es_debug_trace') !== '') {
  logger.info('Setting ElastAlert es_debug_trace mode. Enable logging from Elasticsearch queries as curl command. Queries will be logged to file.');
  startArguments.push('--es_debug_trace', config.get('es_debug_trace'));

elastalert-server/config/config.json添加` “es_debug_trace”: “/tmp/“即可

参考文章:

https://github.com/Yelp/elastalert/issues/2690

https://github.com/nsano-rururu/elastalert-server

https://chunlife.top/2019/03/27/es%E5%91%8A%E8%AD%A6%E5%8A%9F%E8%83%BD%E2%80%94%E2%80%94elastalert/

https://segmentfault.com/a/1190000017553282

http://www.mamicode.com/info-detail-2269787.html