当前位置:   article > 正文

大数据技术实践_大数据专业实践内容

大数据专业实践内容

一、Flume

JDK版本:1.8.0_211

Flume版本:1.8.0

下载:略

配置:

  • 系统环境变量
    • export FLUME_HOME=/usr/local/flume/apache-flume-1.8.0-bin
    • export FLUME_CONF_DIR=$FLUME_HOME/conf
    • PATH加上$FLUME_HOME/bin
  • conf/flume-env.sh配置JAVA_HOME

1、使用Flume接收来自AvroSource的信息

  • conf/avro.conf
    1. a1.sources=r1
    2. a1.sinks=k1
    3. a1.channels=c1
    4. #Describe/configure the source
    5. a1.sources.r1.type=avro
    6. a1.sources.r1.channels=c1
    7. a1.sources.r1.bind=0.0.0.0
    8. a1.sources.r1.port=4141
    9. #Describe the sink
    10. a1.sinks.k1.type=logger
    11. #Use a channel which buffers events in memory
    12. a1.channels.c1.type=memory
    13. a1.channels.c1.capacity=1000
    14. a1.channels.c1.transactionCapacity=100
    15. #Bind thr source and sink to the channel
    16. a1.sources.r1.channels=c1
    17. a1.sinks.k1.channel=c1

     

  • 启动控制台

    /usr/local/flume/apache-flume-1.8.0-bin/bin/flume-ng agent -c . -f /usr/local/flume/apache-flume-1.8.0-bin/conf/avro.conf -n a1 -Dflume.root.logger=INFO,console

     

  • 再开一个shell,在Flume主目录下创建文件
     

    sh -c 'echo "hello, world" > /usr/local/flume/apache-flume-1.8.0-bin/log.00'

     

  • 执行命令
     

    ./bin/flume-ng avro-client --conf conf -H localhost -p 4141 -F /usr/local/flume/apache-flume-1.8.0-bin/log.00

     

  • 观察主Shell,已输出结果。

2、使用Flume接收来自NetcatSource的信息

  • conf/example.conf
    1. #example.conf: A single-node Flume configuration
    2. #Name the components on this agent
    3. a1.sources=r1
    4. a1.sinks=k1
    5. a1.channels=c1
    6. #Describe/configure the source
    7. a1.sources.r1.type=netcat
    8. a1.sources.r1.bind=loca1host
    9. a1.sources.r1.port=44444
    10. #Describe the sink
    11. a1.sinks.k1.type=logger
    12. #Use a channel which buffers events in memory
    13. a1.channels.c1.type=memory
    14. a1.channels.c1.capacity=1000
    15. a1.channels.c1.transactionCapacity=100
    16. #Bind thr source and sink to the channel
    17. a1.sources.r1.channels=c1
    18. a1.sinks.k1.channel=c1

     

  • 启动控制台
    /usr/local/flume/apache-flume-1.8.0-bin/bin/flume-ng agent --conf ./conf --conf-file ./conf/example.conf --name a1 -Dflume.root.logger=INFO,console

     

  • 再开一个shell,输入以下命令
    telnet localhost 44444

     

  • 在该shell中输入的内容会同步到主Shell中

3、Flume接收本地文件

  • conf/exec1.conf
    1. #Name the components on this agent
    2. a1.sources=r1
    3. a1.sinks=k1
    4. a1.channels=c1
    5. #For each one of the source, the type is defined
    6. a1.sources.r1.type=exec
    7. a1.sources.r1.command=tail -F /usr/local/hadoop/hadoop-2.7.7/logs/hadoop-root-datanode-bigdata.log
    8. #whereis bash
    9. a1.sources.r1.shell=/usr/bin/bash -c
    10. #Each sink's type must be defined
    11. a1.sinks.k1.type=hdfs
    12. a1.sinks.k1.hdfs.path=hdfs://bigdata:9000/flume/%y%m%d/%H
    13. a1.sinks.k1.hdfs.filePrefix=logs-
    14. a1.sinks.k1.hdfs.round=true
    15. a1.sinks.k1.hdfs.roundValue=1
    16. a1.sinks.k1.hdfs.roundUnit=minute
    17. a1.sinks.k1.hdfs.useLocalTimeStamp=true
    18. a1.sinks.k1.hdfs.batchSize=100
    19. a1.sinks.k1.hdfs.fileType=DataStream
    20. a1.sinks.k1.hdfs.rollInterval=30
    21. a1.sinks.k1.hdfs.rollSize=134217700
    22. a1.sinks.k1.hdfs.rollCount=0
    23. a1.sinks.k1.hdfs.minBlockReplicas=1
    24. #Specify the channel should use
    25. a1.channels.c1.type=memory
    26. a1.channels.c1.capacity=1000
    27. a1.channels.c1.transactionCapacity=100
    28. #Bind thr source and sink to the channel
    29. a1.sources.r1.channels=c1
    30. a1.sinks.k1.channel=c1

     

  • 启动控制台
    /usr/local/flume/apache-flume-1.8.0-bin/bin/flume-ng agent --conf ./conf --conf-file ./conf/exec1.conf --name a1 -Dflume.root.logger=INFO,console

     

  • 去HDFS观察结果

4、Flume接收本地文件夹

  • conf/spooldir1.conf
    1. #agent名, source、channel、sink的名称
    2. a1.sources = r1
    3. a1.channels = c1
    4. a1.sinks = k1
    5. #具体定义source
    6. a1.sources.r1.type = spooldir
    7. a1.sources.r1.spoolDir = /usr/local/flume/apache-flume-1.8.0-bin/logs
    8. #具体定义channel
    9. a1.channels.c1.type = memory
    10. a1.channels.c1.capacity = 1000
    11. a1.channels.c1.transactionCapacity = 100
    12. #具体定义sink
    13. a1.sinks.k1.type = hdfs
    14. a1.sinks.k1.hdfs.path = hdfs://bigdata:9000/flume/%Y%m%d
    15. a1.sinks.k1.hdfs.filePrefix = events-
    16. a1.sinks.k1.hdfs.fileType = DataStream
    17. a1.sinks.k1.hdfs.useLocalTimeStamp = true
    18. #不按照条数生成文件
    19. a1.sinks.k1.hdfs.rollCount = 0
    20. #HDFS上的文件达到128M时生成一个文件
    21. a1.sinks.k1.hdfs.rollSize = 134217700
    22. #HDFS上的文件达到60秒生成一个文件
    23. a1.sinks.k1.hdfs.rollInterval = 30
    24. #组装source、channel、sink
    25. a1.sources.r1.channels = c1
    26. a1.sinks.k1.channel = c1

     

  • 启动控制台
    /usr/local/flume/apache-flume-1.8.0-bin/bin/flume-ng agent --conf ./conf --conf-file ./conf/spooldir1.conf --name a1 -Dflume.root.logger=INFO,console

     

  • 去HDFS观察结果
声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/知新_RL/article/detail/671927
推荐阅读
相关标签
  

闽ICP备14008679号