赞
踩
logstash从kafka中消费数据,并通过udp转发出去。kafka中的日志格式为json,其中formatlog下面为需求数据,利用logstash提取formatlog里面的数。
- input { kafka {
- bootstrap_servers => "10.10.10.101:9092" #这里可以是kafka集群,如"10.10.10.101:9092,10.10.10.102:9092,10.10.10.103:9092"
- group_id => "host_log"
- client_id => "logstash1" #注意,多台logstash实例消费同一个topics时,client_id需要指定不同的名字
- auto_offset_reset => "latest"
- topics => ["host"]
- add_field => {"logs_type" => "host"}
- codec => json { charset => "UTF-8" }
- }
-
- kafka {
- bootstrap_servers => "10.10.10.101:9092"
- group_id => "vpn_log"
- client_id => "logstash1"
- auto_offset_reset => "latest"
- topics => ["vpn"]
- add_field => {"logs_type" => "vpn"}
- codec => json { charset => "UTF-8" }
- }
-
- }
-
- filter { mutate {
- remove_field => ["@version","host","@timestamp","type"] # 删除字段
- replace => {"message" => "%{[formatlog]}"} #重写message,只保留json中的formatlog
- }
- }
-
- output {
- #stdout{}
- if[logs_type] == "host" {
- syslog {
- appname => "host"
- host => "127.0.0.1"
- port => "8001"
- protocol => "udp"
- }
- }
-
- if[logs_type] == "vpn" {
- syslog {
- appname => "vpn"
- host => "127.0.0.1"
- port => "8002"
- protocol => "udp"
- }
- }
- }
-
说明: 以上配置中加入了group_id参数,group_id是一个的字符串,唯一标识一个group,具有相同group_id的consumer构成了一个consumer group,这样启动多个logstash进程,只需要保证group_id一致就能达到logstash高可用的目的,一个logstash挂掉同一Group内的logstash可以继续消费
注意事项:
多台logstash实例消费同一个topics时,需要保证kafka的分区不能只有一个,logstash的实例数不能大于kafka的分区数。
kafka查看服务端topics、consumer group状态命令
以下命令中使用的bootstrap–server(即broker)地址为:10.10.10.101:9092
在本地kafka客户端安装目录下执行以下命令:
bin/kafka-topics.sh --bootstrap-server 10.10.10.101:9092 --list
bin/kafka-topics.sh --bootstrap-server 10.10.10.101:9092 --describe topics vpn
bin/kafka-consumer-groups.sh --bootstrap-server 10.10.10.101:9092 --list
bin/kafka-consumer-groups.sh --bootstrap-server 10.10.10.101:9092 --group vpn_log --describe
其中依次展示group名称、消费的topic名称、partition id、consumer group最后一次提交的offset、最后提交的生产消息offset、消费offset与生产offset之间的差值、当前消费topic-partition的group成员id.
./bin/kafka-console-consumer.sh --bootstrap-server 10.10.10.101:9092 --topic vpn --from-beginning
在logstash消费kafka数据时,consumer_threads参数用于指定从kafka中读取数据的线程数,即同时从kafka中读取数据的数量。该参数的值越大,logstash从kafka读取数据的速度就越快。但是,如果该值过大,可能会导致系统性能下降。
与此不同的是,work参数则是指定logstash中并行执行的worker数,即同时进行过滤、处理数据的线程数。该参数的值越大,logstash处理数据的能力就越强。但同样地,如果该值过大,可能会导致系统性能下降。
因此,consumer_threads参数是用于调整从kafka中读取数据的速度,而work参数则是用于调整logstash的整体处理能力。
样例:
- input {
- kafka {
- bootstrap_servers => "192.168.10.153:9092"
- group_id => "logstash_test"
- auto_offset_reset => "latest"
- topics => ["log_info"]
- consumer_threads => 2
- workers => 5
- codec => json { ##添加json插件
- charset => "UTF-8"
- }
- }
- }
logstash中的queue.type参数用于指定队列的类型,目前支持两种类型:memory和persisted。
memory:使用内存作为队列存储方式,数据仅在内存中存储,适用于数据量较小的场景。
persisted:使用磁盘作为队列存储方式,会将数据存储到磁盘文件中,适用于数据量较大的场景。
queue.type的默认值是memory,如果需要使用persisted类型的队列,需要指定文件路径和文件名。
- filter {
- ruby {
- code => "event.set('timestamp', event.get('@timestamp').time.localtime + 8*60*60)"
- }
- ruby {
- code => "event.set('@timestamp',event.get('timestamp'))"
- }
- mutate {
- remove_field => ["timestamp"]
- }
-
- }
- filter {
- date {
- match => ["time", "yyyy-MM-dd HH:mm:ss"]
- target => "@timestamp"
- }
-
- ruby {
- code => "event.set('timestamp', event.get('@timestamp').time.localtime + 8*60*60)"
- }
- ruby {
- code => "event.set('@timestamp',event.get('timestamp'))"
- }
- mutate {
- remove_field => ["timestamp"]
- }
-
- }
- filter {
- date {
- match => ["time", "yyyy-MM-dd HH:mm:ss"]
- target => "timetest"
- }
-
- ruby {
- code => "event.set('daytime', ( event.get('timetest').time.localtime + 8*60*60).strftime('%Y-%m-%d'))"
- }
-
- mutate {
- remove_field => ["timetest"]
- }
-
- }
按自定义模板输出到elasticsearch。
如下实现了取@timestamp的天,动态创建index索引
以itemId字段作为索引id
lush_size 和 idle_flush_time 两个参数共同控制 Logstash 向 Elasticsearch 发送批量数据的行为。以上面示例来说:Logstash 会努力攒到 5条数据一次性发送出去,但是如果 5秒钟内也没攒够 5条,Logstash 还是会以当前攒到的数据量发一次。从 5.0 开始,这个行为有了另一个前提:flush_size 的大小不能超过 Logstash 运行时的命令行参数设置的 batch_size,否则将以 batch_size 为批量发送的大小。
- output {
- elasticsearch {
- flush_size => 5
- idle_flush_time => 5
- hosts => ["http://192.168.10.153:9200"]
- index => "log_info-%{+YYYY.MM.dd}"
- document_type => "log_type"
- document_id => "%{itemId}"
- template => "/root/logstash-5.4.1/config/temp_log_info.json" #Elasticsearh模板路径
- template_name => "log_info_tmp" #Elasticsearh模板名称
- template_overwrite => true
- }
- stdout {
- codec => json_lines
- }
- }
- {
- "template":"log_info*",
- "mappings":{
- "article":{
- "dynamic":"strict",
- "_all":{
- "enabled":false
- },
- "properties":{
- "title":{
- "type":"string",
- "index":"analyzed",
- "analyzer":"ik_max_word",
- "search_analyzer":"ik_max_word"
- },
- "author":{
- "type":"string",
- "index":"no"
- },
- "itemId":{
- "type":"long"
- },
- "site":{
- "type":"keyword"
- },
- "time":{
- "type":"date",
- "index":"not_analyzed",
- "format":"yyyy-MM-dd HH:mm:ss"
- }
- }
- }
- }
- }
- input {
- file {
- path => "/usr/local/my.log"
- start_position => "beginning"
- type => "infolog"
- sincedb_path => "/dev/null"
- }
- file {
- path => "/usr/local/my1.log"
- start_position => "beginning"
- type => "errlog"
- sincedb_path => "/dev/null"
- }
-
- }
- filter {
- json {
- source => "message"
- }
- date {
- match => ["timestamp", "dd/MMM/yyyy:HH:mm:ss Z"] #匹配timestamp字段
- target => "@timestamp" #将匹配到的数据写到@timestamp字段中
- }
- }
-
- output {
- if [type] == "infolog" {
- elasticsearch {
- hosts => ["test:9200"]
- index => "infolog-%{+YYYY.MM.dd}"
- }
- } else if [type] == "errlog" {
- elasticsearch {
- hosts => ["test:9200"]
- index => "errlog-%{+YYYY.MM.dd}"
- }
- }
-
- }
./bin/logstash -f ./config/test.conf
bin/logstash -f apache.config --config.reload.automatic
修改config/logstash.yml
path.data: /path/to/data/directory
注意:在设置 path.data 的时候,需要确保 Logstash 进程对该目录有读写权限。同时如果你运行了多个 Logstash 实例,需要保证每个实例的 path.data 目录是不同的,以便避免数据冲突。带认证的es入库
- input {
- kafka {
- bootstrap_servers => "kafka_host:9092" # 替换为Kafka的主机和端口
- topics => ["topic_name"] # 替换为要消费的Kafka主题名称
- group_id => "logstash_consumer"
- codec => json
- }
- }
-
- output {
- elasticsearch {
- hosts => ["http://elasticsearch_host:9200"] # 替换为Elasticsearch的主机和端口
- user => "aaa" # Elasticsearch的用户名
- password => "ccc" # Elasticsearch的密码
- index => "your_index_name" # 替换为要写入的Elasticsearch索引名称
- document_id => "%{id}" # 替换为JSON数据中表示文档ID的字段名称
- }
- }
启动生产者:
./bin/kafka-console-producer.sh --bootstrap-server 192.168.10.153:9092 --topic log_info
插入测试数据:
{"title":"aa","author":"bbbb","itemId":12335,"site":"dafadf","time":"2023-01-01 01:00:00"}
#大批量测试用这种更方便
cat log.txt | ./bin/kafka-console-producer.sh --bootstrap-server 192.168.10.153:9092 --topic log_info
#创建模板
curl -u 'elastic:xxx' -X PUT --header 'Content-Type: application/json' --header 'Accept: application/json' 'http://10.x.x.x:9200/_template/tmp_news' -d@tmp_news.json
- {
- "template":"news_*",
- "aliases": {
- "news_total": {}
- },
- "mappings": {
- "properties": {
- "content":{
- "type":"text",
- "analyzer":"ik_max_word",
- "search_analyzer":"ik_max_word"
- },
- "data_id":{
- "type":"keyword"
- },
- "uid":{
- "type":"keyword"
- },
- "group_id":{
- "type":"keyword"
- },
- "pubtime":{
- "type":"date",
- "format":"yyyy-MM-dd HH:mm:ss"
- },
- "insert_time":{
- "type":"date",
- "format":"yyyy-MM-dd HH:mm:ss"
- }
- }
- }
-
- }
curl -u 'elastic:xxx' -X PUT --header 'Content-Type: application/json' --header 'Accept: application/json' 'http://10.x.x.x:9200/_template/tmp_tg' -d@tmp_tg.json
- {
- "template":"tg_*",
- "aliases": {
- "tg_total": {}
- },
- "mappings": {
- "properties": {
- "content":{
- "type":"text",
- "analyzer":"ik_max_word",
- "search_analyzer":"ik_max_word"
- },
- "data_id":{
- "type":"keyword"
- },
- "uid":{
- "type":"keyword"
- },
- "group_id":{
- "type":"keyword"
- },
- "pubtime":{
- "type":"date",
- "format":"yyyy-MM-dd HH:mm:ss"
- },
- "insert_time":{
- "type":"date",
- "format":"yyyy-MM-dd HH:mm:ss"
- }
- }
- }
-
- }
#测试模板是否生效
curl -u 'elastic:xxx' -X PUT --header 'Content-Type: application/json' --header 'Accept: application/json' 'http://10.x.x.x:9200/news_test'
#查看实体
curl -u 'elastic:xxx' -XGET 'http://10.x.x.x:9200/_cat/indices?v'
#查看mapping
curl -u 'elastic:xxx' -XGET 'http://10.x.x.x:9200/news_test/_mapping?pretty'
#查看索引的别名
curl -u 'elastic:xxx' -XGET '10.x.x.x:9200/news_test/_alias'
#查看模板
curl -u 'elastic:xxx' -XGET http://10.x.x.x:9200/_template/tmp_tg?pretty
#删除模板
curl -u 'elastic:xxx' -XDELETE 10.x.x.x:9200/_template/tmp_tg
#准备logstash配置
- input {
- kafka {
- bootstrap_servers => "10.x.x.x:9092"
- topics => ["line_new_3"]
- group_id => "logstash_consumer_news"
- codec => json
- }
- }
-
- filter {
- ruby {
- code => "event.set('timestamp', event.get('@timestamp').time.localtime + 8*60*60)"
- }
- ruby {
- code => "event.set('@timestamp',event.get('timestamp'))"
- }
- mutate {
- remove_field => ["timestamp"]
- }
-
- }
-
- output {
- elasticsearch {
- hosts => ["http://10.x.x.x:9200"]
- index => "news_%{+YYYY-MM}"
- user => "elastic"
- password => "xxx"
- document_id => "%{data_id}"
- template => "/data/es7/tmp_news.json" #Elasticsearh模板路径
- template_name => "tmp_news" #Elasticsearh模板名称
- template_overwrite => true
- }
- stdout {
- codec => json_lines
- }
- }
- input {
- kafka {
- bootstrap_servers => "10.x.x.x:9092"
- topics => ["tggv1_3"]
- group_id => "logstash_consumer_tg"
- codec => json
- }
- }
-
- filter {
- ruby {
- code => "event.set('timestamp', event.get('@timestamp').time.localtime + 8*60*60)"
- }
- ruby {
- code => "event.set('@timestamp',event.get('timestamp'))"
- }
- mutate {
- remove_field => ["timestamp"]
- }
-
- }
-
- output {
- elasticsearch {
- hosts => ["http://10.x.x.x:9200"]
- index => "tg_%{+YYYY-MM}"
- user => "elastic"
- password => "xxx"
- document_id => "%{data_id}"
- template => "/data/es7/tmp_tg.json" #Elasticsearh模板路径
- template_name => "tmp_tg" #Elasticsearh模板名称
- template_overwrite => true
- }
- stdout {
- codec => json_lines
- }
- }
#启动配置
- ./bin/logstash -f config/news_tmp.conf
- ./bin/logstash -f config/tg_tmp.conf
#查看实体
curl -u 'elastic:xxx' -XGET 'http://10.x.x.x:9200/_cat/indices?v'
#查看索引创建是否正常
curl -u 'elastic:xxx' -XGET 'http://10.x.x.x:9200/news_2023-08/_mapping?pretty'
#查看数据否正常
curl -u 'elastic:xxx' -X GET "http://10.x.x.x:9200/news_2023-08/_doc/005eadb0b289abef5f02d553bb07f164"
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。