赞
踩
flume的两种采集模式
1.以文件为答案,只要新增了一个文件就把该文件采集到hdfs上去
配置文件如下
#定义三大组件的名称 agent1.sources = source1 agent1.sinks = sink1 agent1.channels = channel1 # 配置source组件 agent1.sources.source1.type = spooldir #从这个文件夹下采集 agent1.sources.source1.spoolDir = /home/hadoop/logs/ agent1.sources.source1.fileHeader = false #配置拦截器 agent1.sources.source1.interceptors = i1 agent1.sources.source1.interceptors.i1.type = host agent1.sources.source1.interceptors.i1.hostHeader = hostname # 配置sink组件 agent1.sinks.sink1.type = hdfs #采集到这里 agent1.sinks.sink1.hdfs.path =hdfs://hdp-node-01:9000/weblog/flume-collection/%y-%m-%d/%H-%M agent1.sinks.sink1.hdfs.filePrefix = access_log agent1.sinks.sink1.hdfs.maxOpenFiles = 5000 agent1.sinks.sink1.hdfs.batchSize= 100 agent1.sinks.sink1.hdfs.fileType = DataStream agent1.sinks.sink1.hdfs.writeFormat =Text agent1.sinks.sink1.hdfs.rollSize = 102400 agent1.sinks.sink1.hdfs.rollCount = 1000000 agent1.sinks.sink1.hdfs.rollInterval = 60 #agent1.sinks.sink1.hdfs.round = true #agent1.sinks.sink1.hdfs.roundValue = 10 #agent1.sinks.sink1.hdfs.roundUnit = minute agent1.sinks.sink1.hdfs.useLocalTimeStamp = true # Use a channel which buffers events in memory agent1.channels.channel1.type = memory agent1.channels.channel1.keep-alive = 120 agent1.channels.channel1.capacity = 500000 agent1.channels.channel1.transactionCapacity = 600 # Bind the source and sink to the channel agent1.sources.source1.channels = channel1 agent1.sinks.sink1.channel = channel1
启动方式
到flume的目录下
bin/flume-ng agent -c conf -f conf/onetest.conf -n agent1 -Dflume.root.logger=INFO,console
2.由于我们日志文件都是追加在某一个文件下,这种指定一个文件,只要追加了会采集到hdfs上去
agent2.sources = source1 agent2.sinks = sink1 agent2.channels = channel1 # Describe/configure tail -F source1 agent2.sources.source1.type = exec agent2.sources.source1.command = tail -F /usr/local/flume/testflume2/mylog.log agent2.sources.source1.channels = channel1 #configure host for source agent2.sources.source1.interceptors = i1 agent2.sources.source1.interceptors.i1.type = host agent2.sources.source1.interceptors.i1.hostHeader = hostname # Describe sink1 agent2.sinks.sink1.type = hdfs #a1.sinks.k1.channel = c1 agent2.sinks.sink1.hdfs.path =hdfs://192.168.11.134:9000/weblog/flume-collection2/%y-%m-%d/%H-%M agent2.sinks.sink1.hdfs.filePrefix = access_log agent2.sinks.sink1.hdfs.maxOpenFiles = 5000 agent2.sinks.sink1.hdfs.batchSize= 100 agent2.sinks.sink1.hdfs.fileType = DataStream agent2.sinks.sink1.hdfs.writeFormat =Text agent2.sinks.sink1.hdfs.rollSize = 102400 agent2.sinks.sink1.hdfs.rollCount = 1000000 agent2.sinks.sink1.hdfs.rollInterval = 60 agent2.sinks.sink1.hdfs.round = true agent2.sinks.sink1.hdfs.roundValue = 10 agent2.sinks.sink1.hdfs.roundUnit = minute agent2.sinks.sink1.hdfs.useLocalTimeStamp = true # Use a channel which buffers events in memory agent2.channels.channel1.type = memory agent2.channels.channel1.keep-alive = 120 agent2.channels.channel1.capacity = 500000 agent2.channels.channel1.transactionCapacity = 600 # Bind the source and sink to the channel agent2.sources.source1.channels = channel1 agent2.sinks.sink1.channel = channel1
启动方法
bin/flume-ng agent -c conf -f conf/twotest.conf -n agent2 -Dflume.root.logger=INFO,console
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。