赞
踩
- 单一数据模型
- 多数据流模型
在单个 Agent 内由单个 Source, Channel, Sink 建立一个单一的数据流模型,如下图所示,整个数据流为 Web Server --> Source --> Channel --> Sink --> HDFS。
在flume提供的数据流模型中,几个原则很重要。
Source--> Channel
1.单个Source组件可以和多个Channel组合建立数据流,既可以replicating(复制) 和 multiplexing(多路复用)。
2.多个Sources可以写入单个 Channel
Channel-->Sink
1.多个Sinks又可以组合成Sinkgroups从Channel中获取数据,既可以loadbalancing(负载均衡)和failover(故障转移)机制。
2.多个Sinks也可以从单个Channel中取数据。
3.单个Sink只能从单个Channel中取数据
根据上述 5 个原则,你可以设计出满足你需求的数据流模型。
要定义单个代理中的流,您需要通过通道链接源和接收器。您需要列出给定代理的源,接收器和通道,然后将源和接收器指向一个通道。一个源实例可以指定多个通道,但是一个接收器实例只能指定一个通道。格式如下:
# list the sources, sinks and channels for the agent
<Agent>.sources = <Source>
<Agent>.channels = <Channel1> <Channel2>
<Agent>.sinks = <Sink>
# set channel for source
<Agent>.sources.<Source>.channels = <Channel1> <Channel2> ...
# set channel for sink
<Agent>.sinks.<Sink>.channel = <Channel1>
案例如下:
# list the sources, sinks and channels for the agent
agent_foo.sources = avro-appserver-src-1
agent_foo.channels = mem-channel-1
agent_foo.sinks = hdfs-sink-1
# set channel for source
agent_foo.sources.avro-appserver-src-1.channels = mem-channel-1
# set channel for sink
agent_foo.sinks.hdfs-sink-1.channel = mem-channel-1
# properties for sources
<Agent>.sources.<Source>.<someProperty> = <someValue>
# properties for channels
<Agent>.channel.<Channel>.<someProperty> = <someValue>
# properties for sinks
<Agent>.sources.<Sink>.<someProperty> = <someValue>
案例如下:
agent_foo.sources = avro-AppSrv-source agent_foo.sinks = hdfs-Cluster1-sink agent_foo.channels = mem-channel-1 # set channel for sources, sinks # properties of avro-AppSrv-source agent_foo.sources.avro-AppSrv-source.type = avro agent_foo.sources.avro-AppSrv-source.bind = localhost agent_foo.sources.avro-AppSrv-source.port = 10000 # properties of mem-channel-1 agent_foo.channels.mem-channel-1.type = memory agent_foo.channels.mem-channel-1.capacity = 1000 agent_foo.channels.mem-channel-1.transactionCapacity = 100 # properties of hdfs-Cluster1-sink agent_foo.sinks.hdfs-Cluster1-sink.type = hdfs agent_foo.sinks.hdfs-Cluster1-sink.hdfs.path = hdfs://namenode/flume/webdata #...
# Avro source: avro # Syslog TCP source: syslogtcp # Syslog UDP Source: syslogudp # HTTP Source: http # Exec source: exec # JMS source: jms # Thrift source: thrift # Spooling directory source:(监控目录) spooldir # Kafka source: org.apache.flume.source.kafka,KafkaSource .....
# Memory Channel
memory
# JDBC Channel
jdbc
# Kafka Channel
org.apache.flume.channel.kafka.KafkaChannel
# File Channel
file
# HDFS Sink
HDFS
# HIVE Sink
Hive
# Logger Sink
Logger
# Avro Sink
Avro
# Kafka Sink
org.apache.flume.sink.kafka.KafkaSink
# Hbase Sink
Hbase
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。